stb/stb_image.h

   1 /* stb_image - v2.27 - public domain image loader - http://nothings.org/stb
   2                                   no warranty implied; use at your own risk
   3
   4    Do this:
   5       #define STB_IMAGE_IMPLEMENTATION
   6    before you include this file in *one* C or C++ file to create the implementation.
   7
   8    // i.e. it should look like this:
   9    #include ...
  10    #include ...
  11    #include ...
  12    #define STB_IMAGE_IMPLEMENTATION
  13    #include "stb_image.h"
  14
  15    You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
  16    And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
  17
  18
  19    QUICK NOTES:
  20       Primarily of interest to game developers and other people who can
  21           avoid problematic images and only need the trivial interface
  22
  23       JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
  24       PNG 1/2/4/8/16-bit-per-channel
  25
  26       TGA (not sure what subset, if a subset)
  27       BMP non-1bpp, non-RLE
  28       PSD (composited view only, no extra channels, 8/16 bit-per-channel)
  29
  30       GIF (*comp always reports as 4-channel)
  31       HDR (radiance rgbE format)
  32       PIC (Softimage PIC)
  33       PNM (PPM and PGM binary only)
  34
  35       Animated GIF still needs a proper API, but here's one way to do it:
  36           http://gist.github.com/urraka/685d9a6340b26b830d49
  37
  38       - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
  39       - decode from arbitrary I/O callbacks
  40       - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
  41
  42    Full documentation under "DOCUMENTATION" below.
  43
  44
  45 LICENSE
  46
  47   See end of file for license information.
  48
  49 RECENT REVISION HISTORY:
  50
  51       2.27  (2021-07-11) document stbi_info better, 16-bit PNM support, bug fixes
  52       2.26  (2020-07-13) many minor fixes
  53       2.25  (2020-02-02) fix warnings
  54       2.24  (2020-02-02) fix warnings; thread-local failure_reason and flip_vertically
  55       2.23  (2019-08-11) fix clang static analysis warning
  56       2.22  (2019-03-04) gif fixes, fix warnings
  57       2.21  (2019-02-25) fix typo in comment
  58       2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
  59       2.19  (2018-02-11) fix warning
  60       2.18  (2018-01-30) fix warnings
  61       2.17  (2018-01-29) bugfix, 1-bit BMP, 16-bitness query, fix warnings
  62       2.16  (2017-07-23) all functions have 16-bit variants; optimizations; bugfixes
  63       2.15  (2017-03-18) fix png-1,2,4; all Imagenet JPGs; no runtime SSE detection on GCC
  64       2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
  65       2.13  (2016-12-04) experimental 16-bit API, only for PNG so far; fixes
  66       2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
  67       2.11  (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
  68                          RGB-format JPEG; remove white matting in PSD;
  69                          allocate large structures on the stack;
  70                          correct channel count for PNG & BMP
  71       2.10  (2016-01-22) avoid warning introduced in 2.09
  72       2.09  (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
  73
  74    See end of file for full revision history.
  75
  76
  77  ============================    Contributors    =========================
  78
  79  Image formats                          Extensions, features
  80     Sean Barrett (jpeg, png, bmp)          Jetro Lauha (stbi_info)
  81     Nicolas Schulz (hdr, psd)              Martin "SpartanJ" Golini (stbi_info)
  82     Jonathan Dummer (tga)                  James "moose2000" Brown (iPhone PNG)
  83     Jean-Marc Lienher (gif)                Ben "Disch" Wenger (io callbacks)
  84     Tom Seddon (pic)                       Omar Cornut (1/2/4-bit PNG)
  85     Thatcher Ulrich (psd)                  Nicolas Guillemot (vertical flip)
  86     Ken Miller (pgm, ppm)                  Richard Mitton (16-bit PSD)
  87     github:urraka (animated gif)           Junggon Kim (PNM comments)
  88     Christopher Forseth (animated gif)     Daniel Gibson (16-bit TGA)
  89                                            socks-the-fox (16-bit PNG)
  90                                            Jeremy Sawicki (handle all ImageNet JPGs)
  91  Optimizations & bugfixes                  Mikhail Morozov (1-bit BMP)
  92     Fabian "ryg" Giesen                    Anael Seghezzi (is-16-bit query)
  93     Arseny Kapoulkine                      Simon Breuss (16-bit PNM)
  94     John-Mark Allen
  95     Carmelo J Fdez-Aguera
  96
  97  Bug & warning fixes
  98     Marc LeBlanc            David Woo          Guillaume George     Martins Mozeiko
  99     Christpher Lloyd        Jerry Jansson      Joseph Thomson       Blazej Dariusz Roszkowski
 100     Phil Jordan                                Dave Moore           Roy Eltham
 101     Hayaki Saito            Nathan Reed        Won Chun
 102     Luke Graham             Johan Duparc       Nick Verigakis       the Horde3D community
 103     Thomas Ruf              Ronny Chevalier                         github:rlyeh
 104     Janez Zemva             John Bartholomew   Michal Cichon        github:romigrou
 105     Jonathan Blow           Ken Hamada         Tero Hanninen        github:svdijk
 106     Eugene Golushkov        Laurent Gomila     Cort Stratton        github:snagar
 107     Aruelien Pocheville     Sergio Gonzalez    Thibault Reuille     github:Zelex
 108     Cass Everitt            Ryamond Barbiero                        github:grim210
 109     Paul Du Bois            Engin Manap        Aldo Culquicondor    github:sammyhw
 110     Philipp Wiesemann       Dale Weiler        Oriol Ferrer Mesia   github:phprus
 111     Josh Tobin                                 Matthew Gregan       github:poppolopoppo
 112     Julian Raschke          Gregory Mullen     Christian Floisand   github:darealshinji
 113     Baldur Karlsson         Kevin Schmidt      JR Smith             github:Michaelangel007
 114                             Brad Weinberger    Matvey Cherevko      github:mosra
 115     Luca Sas                Alexander Veselov  Zack Middleton       [reserved]
 116     Ryan C. Gordon          [reserved]                              [reserved]
 117                      DO NOT ADD YOUR NAME HERE
 118
 119                      Jacko Dirks
 120
 121   To add your name to the credits, pick a random blank space in the middle and fill it.
 122   80% of merge conflicts on stb PRs are due to people adding their name at the end
 123   of the credits.
 124 */
 125
 126 #ifndef STBI_INCLUDE_STB_IMAGE_H
 127 #define STBI_INCLUDE_STB_IMAGE_H
 128
 129 // DOCUMENTATION
 130 //
 131 // Limitations:
 132 //    - no 12-bit-per-channel JPEG
 133 //    - no JPEGs with arithmetic coding
 134 //    - GIF always returns *comp=4
 135 //
 136 // Basic usage (see HDR discussion below for HDR usage):
 137 //    int x,y,n;
 138 //    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
 139 //    // ... process data if not NULL ...
 140 //    // ... x = width, y = height, n = # 8-bit components per pixel ...
 141 //    // ... replace '0' with '1'..'4' to force that many components per pixel
 142 //    // ... but 'n' will always be the number that it would have been if you said 0
 143 //    stbi_image_free(data)
 144 //
 145 // Standard parameters:
 146 //    int *x                 -- outputs image width in pixels
 147 //    int *y                 -- outputs image height in pixels
 148 //    int *channels_in_file  -- outputs # of image components in image file
 149 //    int desired_channels   -- if non-zero, # of image components requested in result
 150 //
 151 // The return value from an image loader is an 'unsigned char *' which points
 152 // to the pixel data, or NULL on an allocation failure or if the image is
 153 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
 154 // with each pixel consisting of N interleaved 8-bit components; the first
 155 // pixel pointed to is top-left-most in the image. There is no padding between
 156 // image scanlines or between pixels, regardless of format. The number of
 157 // components N is 'desired_channels' if desired_channels is non-zero, or
 158 // *channels_in_file otherwise. If desired_channels is non-zero,
 159 // *channels_in_file has the number of components that _would_ have been
 160 // output otherwise. E.g. if you set desired_channels to 4, you will always
 161 // get RGBA output, but you can check *channels_in_file to see if it's trivially
 162 // opaque because e.g. there were only 3 channels in the source image.
 163 //
 164 // An output image with N components has the following components interleaved
 165 // in this order in each pixel:
 166 //
 167 //     N=#comp     components
 168 //       1           grey
 169 //       2           grey, alpha
 170 //       3           red, green, blue
 171 //       4           red, green, blue, alpha
 172 //
 173 // If image loading fails for any reason, the return value will be NULL,
 174 // and *x, *y, *channels_in_file will be unchanged. The function
 175 // stbi_failure_reason() can be queried for an extremely brief, end-user
 176 // unfriendly explanation of why the load failed. Define STBI_NO_FAILURE_STRINGS
 177 // to avoid compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
 178 // more user-friendly ones.
 179 //
 180 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
 181 //
 182 // To query the width, height and component count of an image without having to
 183 // decode the full file, you can use the stbi_info family of functions:
 184 //
 185 //   int x,y,n,ok;
 186 //   ok = stbi_info(filename, &x, &y, &n);
 187 //   // returns ok=1 and sets x, y, n if image is a supported format,
 188 //   // 0 otherwise.
 189 //
 190 // Note that stb_image pervasively uses ints in its public API for sizes,
 191 // including sizes of memory buffers. This is now part of the API and thus
 192 // hard to change without causing breakage. As a result, the various image
 193 // loaders all have certain limits on image size; these differ somewhat
 194 // by format but generally boil down to either just under 2GB or just under
 195 // 1GB. When the decoded image would be larger than this, stb_image decoding
 196 // will fail.
 197 //
 198 // Additionally, stb_image will reject image files that have any of their
 199 // dimensions set to a larger value than the configurable STBI_MAX_DIMENSIONS,
 200 // which defaults to 2**24 = 16777216 pixels. Due to the above memory limit,
 201 // the only way to have an image with such dimensions load correctly
 202 // is for it to have a rather extreme aspect ratio. Either way, the
 203 // assumption here is that such larger images are likely to be malformed
 204 // or malicious. If you do need to load an image with individual dimensions
 205 // larger than that, and it still fits in the overall size limit, you can
 206 // #define STBI_MAX_DIMENSIONS on your own to be something larger.
 207 //
 208 // ===========================================================================
 209 //
 210 // UNICODE:
 211 //
 212 //   If compiling for Windows and you wish to use Unicode filenames, compile
 213 //   with
 214 //       #define STBI_WINDOWS_UTF8
 215 //   and pass utf8-encoded filenames. Call stbi_convert_wchar_to_utf8 to convert
 216 //   Windows wchar_t filenames to utf8.
 217 //
 218 // ===========================================================================
 219 //
 220 // Philosophy
 221 //
 222 // stb libraries are designed with the following priorities:
 223 //
 224 //    1. easy to use
 225 //    2. easy to maintain
 226 //    3. good performance
 227 //
 228 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
 229 // and for best performance I may provide less-easy-to-use APIs that give higher
 230 // performance, in addition to the easy-to-use ones. Nevertheless, it's important
 231 // to keep in mind that from the standpoint of you, a client of this library,
 232 // all you care about is #1 and #3, and stb libraries DO NOT emphasize #3 above all.
 233 //
 234 // Some secondary priorities arise directly from the first two, some of which
 235 // provide more explicit reasons why performance can't be emphasized.
 236 //
 237 //    - Portable ("ease of use")
 238 //    - Small source code footprint ("easy to maintain")
 239 //    - No dependencies ("ease of use")
 240 //
 241 // ===========================================================================
 242 //
 243 // I/O callbacks
 244 //
 245 // I/O callbacks allow you to read from arbitrary sources, like packaged
 246 // files or some other source. Data read from callbacks are processed
 247 // through a small internal buffer (currently 128 bytes) to try to reduce
 248 // overhead.
 249 //
 250 // The three functions you must define are "read" (reads some bytes of data),
 251 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
 252 //
 253 // ===========================================================================
 254 //
 255 // SIMD support
 256 //
 257 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
 258 // supported by the compiler. For ARM Neon support, you must explicitly
 259 // request it.
 260 //
 261 // (The old do-it-yourself SIMD API is no longer supported in the current
 262 // code.)
 263 //
 264 // On x86, SSE2 will automatically be used when available based on a run-time
 265 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
 266 // the typical path is to have separate builds for NEON and non-NEON devices
 267 // (at least this is true for iOS and Android). Therefore, the NEON support is
 268 // toggled by a build flag: define STBI_NEON to get NEON loops.
 269 //
 270 // If for some reason you do not want to use any of SIMD code, or if
 271 // you have issues compiling it, you can disable it entirely by
 272 // defining STBI_NO_SIMD.
 273 //
 274 // ===========================================================================
 275 //
 276 // HDR image support   (disable by defining STBI_NO_HDR)
 277 //
 278 // stb_image supports loading HDR images in general, and currently the Radiance
 279 // .HDR file format specifically. You can still load any file through the existing
 280 // interface; if you attempt to load an HDR file, it will be automatically remapped
 281 // to LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
 282 // both of these constants can be reconfigured through this interface:
 283 //
 284 //     stbi_hdr_to_ldr_gamma(2.2f);
 285 //     stbi_hdr_to_ldr_scale(1.0f);
 286 //
 287 // (note, do not use _inverse_ constants; stbi_image will invert them
 288 // appropriately).
 289 //
 290 // Additionally, there is a new, parallel interface for loading files as
 291 // (linear) floats to preserve the full dynamic range:
 292 //
 293 //    float *data = stbi_loadf(filename, &x, &y, &n, 0);
 294 //
 295 // If you load LDR images through this interface, those images will
 296 // be promoted to floating point values, run through the inverse of
 297 // constants corresponding to the above:
 298 //
 299 //     stbi_ldr_to_hdr_scale(1.0f);
 300 //     stbi_ldr_to_hdr_gamma(2.2f);
 301 //
 302 // Finally, given a filename (or an open file or memory block--see header
 303 // file for details) containing image data, you can query for the "most
 304 // appropriate" interface to use (that is, whether the image is HDR or
 305 // not), using:
 306 //
 307 //     stbi_is_hdr(char *filename);
 308 //
 309 // ===========================================================================
 310 //
 311 // iPhone PNG support:
 312 //
 313 // We optionally support converting iPhone-formatted PNGs (which store
 314 // premultiplied BGRA) back to RGB, even though they're internally encoded
 315 // differently. To enable this conversion, call
 316 // stbi_convert_iphone_png_to_rgb(1).
 317 //
 318 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
 319 // pixel to remove any premultiplied alpha *only* if the image file explicitly
 320 // says there's premultiplied data (currently only happens in iPhone images,
 321 // and only if iPhone convert-to-rgb processing is on).
 322 //
 323 // ===========================================================================
 324 //
 325 // ADDITIONAL CONFIGURATION
 326 //
 327 //  - You can suppress implementation of any of the decoders to reduce
 328 //    your code footprint by #defining one or more of the following
 329 //    symbols before creating the implementation.
 330 //
 331 //        STBI_NO_JPEG
 332 //        STBI_NO_PNG
 333 //        STBI_NO_BMP
 334 //        STBI_NO_PSD
 335 //        STBI_NO_TGA
 336 //        STBI_NO_GIF
 337 //        STBI_NO_HDR
 338 //        STBI_NO_PIC
 339 //        STBI_NO_PNM   (.ppm and .pgm)
 340 //
 341 //  - You can request *only* certain decoders and suppress all other ones
 342 //    (this will be more forward-compatible, as addition of new decoders
 343 //    doesn't require you to disable them explicitly):
 344 //
 345 //        STBI_ONLY_JPEG
 346 //        STBI_ONLY_PNG
 347 //        STBI_ONLY_BMP
 348 //        STBI_ONLY_PSD
 349 //        STBI_ONLY_TGA
 350 //        STBI_ONLY_GIF
 351 //        STBI_ONLY_HDR
 352 //        STBI_ONLY_PIC
 353 //        STBI_ONLY_PNM   (.ppm and .pgm)
 354 //
 355 //   - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
 356 //     want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
 357 //
 358 //  - If you define STBI_MAX_DIMENSIONS, stb_image will reject images greater
 359 //    than that size (in either width or height) without further processing.
 360 //    This is to let programs in the wild set an upper bound to prevent
 361 //    denial-of-service attacks on untrusted data, as one could generate a
 362 //    valid image of gigantic dimensions and force stb_image to allocate a
 363 //    huge block of memory and spend disproportionate time decoding it. By
 364 //    default this is set to (1 << 24), which is 16777216, but that's still
 365 //    very big.
 366
 367 #ifndef STBI_NO_STDIO
 368 #include <stdio.h>
 369 #endif // STBI_NO_STDIO
 370
 371 #define STBI_VERSION 1
 372
 373 enum
 374 {
 375    STBI_default = 0, // only used for desired_channels
 376
 377    STBI_grey       = 1,
 378    STBI_grey_alpha = 2,
 379    STBI_rgb        = 3,
 380    STBI_rgb_alpha  = 4
 381 };
 382
 383 #include <stdlib.h>
 384 typedef unsigned char stbi_uc;
 385 typedef unsigned short stbi_us;
 386
 387 #ifdef __cplusplus
 388 extern "C" {
 389 #endif
 390
 391 #ifndef STBIDEF
 392 #ifdef STB_IMAGE_STATIC
 393 #define STBIDEF static
 394 #else
 395 #define STBIDEF extern
 396 #endif
 397 #endif
 398
 399 //////////////////////////////////////////////////////////////////////////////
 400 //
 401 // PRIMARY API - works on images of any type
 402 //
 403
 404 //
 405 // load image by filename, open file, or memory buffer
 406 //
 407
 408 typedef struct
 409 {
 410    int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
 411    void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
 412    int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
 413 } stbi_io_callbacks;
 414
 415 ////////////////////////////////////
 416 //
 417 // 8-bits-per-channel interface
 418 //
 419
 420 STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *channels_in_file, int desired_channels);
 421 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *channels_in_file, int desired_channels);
 422
 423 #ifndef STBI_NO_STDIO
 424 STBIDEF stbi_uc *stbi_load            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
 425 STBIDEF stbi_uc *stbi_load_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
 426 // for stbi_load_from_file, file pointer is left pointing immediately after image
 427 #endif
 428
 429 #ifndef STBI_NO_GIF
 430 STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
 431 #endif
 432
 433 #ifdef STBI_WINDOWS_UTF8
 434 STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input);
 435 #endif
 436
 437 ////////////////////////////////////
 438 //
 439 // 16-bits-per-channel interface
 440 //
 441
 442 STBIDEF stbi_us *stbi_load_16_from_memory   (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
 443 STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels);
 444
 445 #ifndef STBI_NO_STDIO
 446 STBIDEF stbi_us *stbi_load_16          (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
 447 STBIDEF stbi_us *stbi_load_from_file_16(FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
 448 #endif
 449
 450 ////////////////////////////////////
 451 //
 452 // float-per-channel interface
 453 //
 454 #ifndef STBI_NO_LINEAR
 455    STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
 456    STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y,  int *channels_in_file, int desired_channels);
 457
 458    #ifndef STBI_NO_STDIO
 459    STBIDEF float *stbi_loadf            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
 460    STBIDEF float *stbi_loadf_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
 461    #endif
 462 #endif
 463
 464 #ifndef STBI_NO_HDR
 465    STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
 466    STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
 467 #endif // STBI_NO_HDR
 468
 469 #ifndef STBI_NO_LINEAR
 470    STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
 471    STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
 472 #endif // STBI_NO_LINEAR
 473
 474 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
 475 STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
 476 STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
 477 #ifndef STBI_NO_STDIO
 478 STBIDEF int      stbi_is_hdr          (char const *filename);
 479 STBIDEF int      stbi_is_hdr_from_file(FILE *f);
 480 #endif // STBI_NO_STDIO
 481
 482
 483 // get a VERY brief reason for failure
 484 // on most compilers (and ALL modern mainstream compilers) this is threadsafe
 485 STBIDEF const char *stbi_failure_reason  (void);
 486
 487 // free the loaded image -- this is just free()
 488 STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
 489
 490 // get image dimensions & components without fully decoding
 491 STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
 492 STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
 493 STBIDEF int      stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len);
 494 STBIDEF int      stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *clbk, void *user);
 495
 496 #ifndef STBI_NO_STDIO
 497 STBIDEF int      stbi_info               (char const *filename,     int *x, int *y, int *comp);
 498 STBIDEF int      stbi_info_from_file     (FILE *f,                  int *x, int *y, int *comp);
 499 STBIDEF int      stbi_is_16_bit          (char const *filename);
 500 STBIDEF int      stbi_is_16_bit_from_file(FILE *f);
 501 #endif
 502
 503
 504
 505 // for image formats that explicitly notate that they have premultiplied alpha,
 506 // we just return the colors as stored in the file. set this flag to force
 507 // unpremultiplication. results are undefined if the unpremultiply overflow.
 508 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
 509
 510 // indicate whether we should process iphone images back to canonical format,
 511 // or just pass them through "as-is"
 512 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
 513
 514 // flip the image vertically, so the first pixel in the output array is the bottom left
 515 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
 516
 517 // as above, but only applies to images loaded on the thread that calls the function
 518 // this function is only available if your compiler supports thread-local variables;
 519 // calling it will fail to link if your compiler doesn't
 520 STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply);
 521 STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert);
 522 STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip);
 523
 524 // ZLIB client - used by PNG, available for other purposes
 525
 526 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
 527 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
 528 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
 529 STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
 530
 531 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
 532 STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
 533
 534
 535 #ifdef __cplusplus
 536 }
 537 #endif
 538
 539 //
 540 //
 541 ////   end header file   /////////////////////////////////////////////////////
 542 #endif // STBI_INCLUDE_STB_IMAGE_H
 543
 544 #ifdef STB_IMAGE_IMPLEMENTATION
 545
 546 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
 547   || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
 548   || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
 549   || defined(STBI_ONLY_ZLIB)
 550    #ifndef STBI_ONLY_JPEG
 551    #define STBI_NO_JPEG
 552    #endif
 553    #ifndef STBI_ONLY_PNG
 554    #define STBI_NO_PNG
 555    #endif
 556    #ifndef STBI_ONLY_BMP
 557    #define STBI_NO_BMP
 558    #endif
 559    #ifndef STBI_ONLY_PSD
 560    #define STBI_NO_PSD
 561    #endif
 562    #ifndef STBI_ONLY_TGA
 563    #define STBI_NO_TGA
 564    #endif
 565    #ifndef STBI_ONLY_GIF
 566    #define STBI_NO_GIF
 567    #endif
 568    #ifndef STBI_ONLY_HDR
 569    #define STBI_NO_HDR
 570    #endif
 571    #ifndef STBI_ONLY_PIC
 572    #define STBI_NO_PIC
 573    #endif
 574    #ifndef STBI_ONLY_PNM
 575    #define STBI_NO_PNM
 576    #endif
 577 #endif
 578
 579 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
 580 #define STBI_NO_ZLIB
 581 #endif
 582
 583
 584 #include <stdarg.h>
 585 #include <stddef.h> // ptrdiff_t on osx
 586 #include <stdlib.h>
 587 #include <string.h>
 588 #include <limits.h>
 589
 590 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
 591 #include <math.h>  // ldexp, pow
 592 #endif
 593
 594 #ifndef STBI_NO_STDIO
 595 #include <stdio.h>
 596 #endif
 597
 598 #ifndef STBI_ASSERT
 599 #include <assert.h>
 600 #define STBI_ASSERT(x) assert(x)
 601 #endif
 602
 603 #ifdef __cplusplus
 604 #define STBI_EXTERN extern "C"
 605 #else
 606 #define STBI_EXTERN extern
 607 #endif
 608
 609
 610 #ifndef _MSC_VER
 611    #ifdef __cplusplus
 612    #define stbi_inline inline
 613    #else
 614    #define stbi_inline
 615    #endif
 616 #else
 617    #define stbi_inline __forceinline
 618 #endif
 619
 620 #ifndef STBI_NO_THREAD_LOCALS
 621    #if defined(__cplusplus) &&  __cplusplus >= 201103L
 622       #define STBI_THREAD_LOCAL       thread_local
 623    #elif defined(__GNUC__) && __GNUC__ < 5
 624       #define STBI_THREAD_LOCAL       __thread
 625    #elif defined(_MSC_VER)
 626       #define STBI_THREAD_LOCAL       __declspec(thread)
 627    #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L && !defined(__STDC_NO_THREADS__)
 628       #define STBI_THREAD_LOCAL       _Thread_local
 629    #endif
 630
 631    #ifndef STBI_THREAD_LOCAL
 632       #if defined(__GNUC__)
 633         #define STBI_THREAD_LOCAL       __thread
 634       #endif
 635    #endif
 636 #endif
 637
 638 #ifdef _MSC_VER
 639 typedef unsigned short stbi__uint16;
 640 typedef   signed short stbi__int16;
 641 typedef unsigned int   stbi__uint32;
 642 typedef   signed int   stbi__int32;
 643 #else
 644 #include <stdint.h>
 645 typedef uint16_t stbi__uint16;
 646 typedef int16_t  stbi__int16;
 647 typedef uint32_t stbi__uint32;
 648 typedef int32_t  stbi__int32;
 649 #endif
 650
 651 // should produce compiler error if size is wrong
 652 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
 653
 654 #ifdef _MSC_VER
 655 #define STBI_NOTUSED(v)  (void)(v)
 656 #else
 657 #define STBI_NOTUSED(v)  (void)sizeof(v)
 658 #endif
 659
 660 #ifdef _MSC_VER
 661 #define STBI_HAS_LROTL
 662 #endif
 663
 664 #ifdef STBI_HAS_LROTL
 665    #define stbi_lrot(x,y)  _lrotl(x,y)
 666 #else
 667    #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (-(y) & 31)))
 668 #endif
 669
 670 #if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
 671 // ok
 672 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
 673 // ok
 674 #else
 675 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
 676 #endif
 677
 678 #ifndef STBI_MALLOC
 679 #define STBI_MALLOC(sz)           malloc(sz)
 680 #define STBI_REALLOC(p,newsz)     realloc(p,newsz)
 681 #define STBI_FREE(p)              free(p)
 682 #endif
 683
 684 #ifndef STBI_REALLOC_SIZED
 685 #define STBI_REALLOC_SIZED(p,oldsz,newsz) STBI_REALLOC(p,newsz)
 686 #endif
 687
 688 // x86/x64 detection
 689 #if defined(__x86_64__) || defined(_M_X64)
 690 #define STBI__X64_TARGET
 691 #elif defined(__i386) || defined(_M_IX86)
 692 #define STBI__X86_TARGET
 693 #endif
 694
 695 #if defined(__GNUC__) && defined(STBI__X86_TARGET) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
 696 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
 697 // which in turn means it gets to use SSE2 everywhere. This is unfortunate,
 698 // but previous attempts to provide the SSE2 functions with runtime
 699 // detection caused numerous issues. The way architecture extensions are
 700 // exposed in GCC/Clang is, sadly, not really suited for one-file libs.
 701 // New behavior: if compiled with -msse2, we use SSE2 without any
 702 // detection; if not, we don't use it at all.
 703 #define STBI_NO_SIMD
 704 #endif
 705
 706 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
 707 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
 708 //
 709 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
 710 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
 711 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
 712 // simultaneously enabling "-mstackrealign".
 713 //
 714 // See https://github.com/nothings/stb/issues/81 for more information.
 715 //
 716 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
 717 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
 718 #define STBI_NO_SIMD
 719 #endif
 720
 721 #if !defined(STBI_NO_SIMD) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
 722 #define STBI_SSE2
 723 #include <emmintrin.h>
 724
 725 #ifdef _MSC_VER
 726
 727 #if _MSC_VER >= 1400  // not VC6
 728 #include <intrin.h> // __cpuid
 729 static int stbi__cpuid3(void)
 730 {
 731    int info[4];
 732    __cpuid(info,1);
 733    return info[3];
 734 }
 735 #else
 736 static int stbi__cpuid3(void)
 737 {
 738    int res;
 739    __asm {
 740       mov  eax,1
 741       cpuid
 742       mov  res,edx
 743    }
 744    return res;
 745 }
 746 #endif
 747
 748 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
 749
 750 #if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
 751 static int stbi__sse2_available(void)
 752 {
 753    int info3 = stbi__cpuid3();
 754    return ((info3 >> 26) & 1) != 0;
 755 }
 756 #endif
 757
 758 #else // assume GCC-style if not VC++
 759 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
 760
 761 #if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
 762 static int stbi__sse2_available(void)
 763 {
 764    // If we're even attempting to compile this on GCC/Clang, that means
 765    // -msse2 is on, which means the compiler is allowed to use SSE2
 766    // instructions at will, and so are we.
 767    return 1;
 768 }
 769 #endif
 770
 771 #endif
 772 #endif
 773
 774 // ARM NEON
 775 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
 776 #undef STBI_NEON
 777 #endif
 778
 779 #ifdef STBI_NEON
 780 #include <arm_neon.h>
 781 #ifdef _MSC_VER
 782 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
 783 #else
 784 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
 785 #endif
 786 #endif
 787
 788 #ifndef STBI_SIMD_ALIGN
 789 #define STBI_SIMD_ALIGN(type, name) type name
 790 #endif
 791
 792 #ifndef STBI_MAX_DIMENSIONS
 793 #define STBI_MAX_DIMENSIONS (1 << 24)
 794 #endif
 795
 796 ///////////////////////////////////////////////
 797 //
 798 //  stbi__context struct and start_xxx functions
 799
 800 // stbi__context structure is our basic context used by all images, so it
 801 // contains all the IO context, plus some basic image information
 802 typedef struct
 803 {
 804    stbi__uint32 img_x, img_y;
 805    int img_n, img_out_n;
 806
 807    stbi_io_callbacks io;
 808    void *io_user_data;
 809
 810    int read_from_callbacks;
 811    int buflen;
 812    stbi_uc buffer_start[128];
 813    int callback_already_read;
 814
 815    stbi_uc *img_buffer, *img_buffer_end;
 816    stbi_uc *img_buffer_original, *img_buffer_original_end;
 817 } stbi__context;
 818
 819
 820 static void stbi__refill_buffer(stbi__context *s);
 821
 822 // initialize a memory-decode context
 823 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
 824 {
 825    s->io.read = NULL;
 826    s->read_from_callbacks = 0;
 827    s->callback_already_read = 0;
 828    s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
 829    s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
 830 }
 831
 832 // initialize a callback-based context
 833 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
 834 {
 835    s->io = *c;
 836    s->io_user_data = user;
 837    s->buflen = sizeof(s->buffer_start);
 838    s->read_from_callbacks = 1;
 839    s->callback_already_read = 0;
 840    s->img_buffer = s->img_buffer_original = s->buffer_start;
 841    stbi__refill_buffer(s);
 842    s->img_buffer_original_end = s->img_buffer_end;
 843 }
 844
 845 #ifndef STBI_NO_STDIO
 846
 847 static int stbi__stdio_read(void *user, char *data, int size)
 848 {
 849    return (int) fread(data,1,size,(FILE*) user);
 850 }
 851
 852 static void stbi__stdio_skip(void *user, int n)
 853 {
 854    int ch;
 855    fseek((FILE*) user, n, SEEK_CUR);
 856    ch = fgetc((FILE*) user);  /* have to read a byte to reset feof()'s flag */
 857    if (ch != EOF) {
 858       ungetc(ch, (FILE *) user);  /* push byte back onto stream if valid. */
 859    }
 860 }
 861
 862 static int stbi__stdio_eof(void *user)
 863 {
 864    return feof((FILE*) user) || ferror((FILE *) user);
 865 }
 866
 867 static stbi_io_callbacks stbi__stdio_callbacks =
 868 {
 869    stbi__stdio_read,
 870    stbi__stdio_skip,
 871    stbi__stdio_eof,
 872 };
 873
 874 static void stbi__start_file(stbi__context *s, FILE *f)
 875 {
 876    stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
 877 }
 878
 879 //static void stop_file(stbi__context *s) { }
 880
 881 #endif // !STBI_NO_STDIO
 882
 883 static void stbi__rewind(stbi__context *s)
 884 {
 885    // conceptually rewind SHOULD rewind to the beginning of the stream,
 886    // but we just rewind to the beginning of the initial buffer, because
 887    // we only use it after doing 'test', which only ever looks at at most 92 bytes
 888    s->img_buffer = s->img_buffer_original;
 889    s->img_buffer_end = s->img_buffer_original_end;
 890 }
 891
 892 enum
 893 {
 894    STBI_ORDER_RGB,
 895    STBI_ORDER_BGR
 896 };
 897
 898 typedef struct
 899 {
 900    int bits_per_channel;
 901    int num_channels;
 902    int channel_order;
 903 } stbi__result_info;
 904
 905 #ifndef STBI_NO_JPEG
 906 static int      stbi__jpeg_test(stbi__context *s);
 907 static void    *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 908 static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
 909 #endif
 910
 911 #ifndef STBI_NO_PNG
 912 static int      stbi__png_test(stbi__context *s);
 913 static void    *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 914 static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
 915 static int      stbi__png_is16(stbi__context *s);
 916 #endif
 917
 918 #ifndef STBI_NO_BMP
 919 static int      stbi__bmp_test(stbi__context *s);
 920 static void    *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 921 static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
 922 #endif
 923
 924 #ifndef STBI_NO_TGA
 925 static int      stbi__tga_test(stbi__context *s);
 926 static void    *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 927 static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
 928 #endif
 929
 930 #ifndef STBI_NO_PSD
 931 static int      stbi__psd_test(stbi__context *s);
 932 static void    *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc);
 933 static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
 934 static int      stbi__psd_is16(stbi__context *s);
 935 #endif
 936
 937 #ifndef STBI_NO_HDR
 938 static int      stbi__hdr_test(stbi__context *s);
 939 static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 940 static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
 941 #endif
 942
 943 #ifndef STBI_NO_PIC
 944 static int      stbi__pic_test(stbi__context *s);
 945 static void    *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 946 static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
 947 #endif
 948
 949 #ifndef STBI_NO_GIF
 950 static int      stbi__gif_test(stbi__context *s);
 951 static void    *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 952 static void    *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
 953 static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
 954 #endif
 955
 956 #ifndef STBI_NO_PNM
 957 static int      stbi__pnm_test(stbi__context *s);
 958 static void    *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 959 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
 960 static int      stbi__pnm_is16(stbi__context *s);
 961 #endif
 962
 963 static
 964 #ifdef STBI_THREAD_LOCAL
 965 STBI_THREAD_LOCAL
 966 #endif
 967 const char *stbi__g_failure_reason;
 968
 969 STBIDEF const char *stbi_failure_reason(void)
 970 {
 971    return stbi__g_failure_reason;
 972 }
 973
 974 #ifndef STBI_NO_FAILURE_STRINGS
 975 static int stbi__err(const char *str)
 976 {
 977    stbi__g_failure_reason = str;
 978    return 0;
 979 }
 980 #endif
 981
 982 static void *stbi__malloc(size_t size)
 983 {
 984     return STBI_MALLOC(size);
 985 }
 986
 987 // stb_image uses ints pervasively, including for offset calculations.
 988 // therefore the largest decoded image size we can support with the
 989 // current code, even on 64-bit targets, is INT_MAX. this is not a
 990 // significant limitation for the intended use case.
 991 //
 992 // we do, however, need to make sure our size calculations don't
 993 // overflow. hence a few helper functions for size calculations that
 994 // multiply integers together, making sure that they're non-negative
 995 // and no overflow occurs.
 996
 997 // return 1 if the sum is valid, 0 on overflow.
 998 // negative terms are considered invalid.
 999 static int stbi__addsizes_valid(int a, int b)
1000 {
1001    if (b < 0) return 0;
1002    // now 0 <= b <= INT_MAX, hence also
1003    // 0 <= INT_MAX - b <= INTMAX.
1004    // And "a + b <= INT_MAX" (which might overflow) is the
1005    // same as a <= INT_MAX - b (no overflow)
1006    return a <= INT_MAX - b;
1007 }
1008
1009 // returns 1 if the product is valid, 0 on overflow.
1010 // negative factors are considered invalid.
1011 static int stbi__mul2sizes_valid(int a, int b)
1012 {
1013    if (a < 0 || b < 0) return 0;
1014    if (b == 0) return 1; // mul-by-0 is always safe
1015    // portable way to check for no overflows in a*b
1016    return a <= INT_MAX/b;
1017 }
1018
1019 #if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
1020 // returns 1 if "a*b + add" has no negative terms/factors and doesn't overflow
1021 static int stbi__mad2sizes_valid(int a, int b, int add)
1022 {
1023    return stbi__mul2sizes_valid(a, b) && stbi__addsizes_valid(a*b, add);
1024 }
1025 #endif
1026
1027 // returns 1 if "a*b*c + add" has no negative terms/factors and doesn't overflow
1028 static int stbi__mad3sizes_valid(int a, int b, int c, int add)
1029 {
1030    return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
1031       stbi__addsizes_valid(a*b*c, add);
1032 }
1033
1034 // returns 1 if "a*b*c*d + add" has no negative terms/factors and doesn't overflow
1035 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
1036 static int stbi__mad4sizes_valid(int a, int b, int c, int d, int add)
1037 {
1038    return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
1039       stbi__mul2sizes_valid(a*b*c, d) && stbi__addsizes_valid(a*b*c*d, add);
1040 }
1041 #endif
1042
1043 #if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
1044 // mallocs with size overflow checking
1045 static void *stbi__malloc_mad2(int a, int b, int add)
1046 {
1047    if (!stbi__mad2sizes_valid(a, b, add)) return NULL;
1048    return stbi__malloc(a*b + add);
1049 }
1050 #endif
1051
1052 static void *stbi__malloc_mad3(int a, int b, int c, int add)
1053 {
1054    if (!stbi__mad3sizes_valid(a, b, c, add)) return NULL;
1055    return stbi__malloc(a*b*c + add);
1056 }
1057
1058 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
1059 static void *stbi__malloc_mad4(int a, int b, int c, int d, int add)
1060 {
1061    if (!stbi__mad4sizes_valid(a, b, c, d, add)) return NULL;
1062    return stbi__malloc(a*b*c*d + add);
1063 }
1064 #endif
1065
1066 // stbi__err - error
1067 // stbi__errpf - error returning pointer to float
1068 // stbi__errpuc - error returning pointer to unsigned char
1069
1070 #ifdef STBI_NO_FAILURE_STRINGS
1071    #define stbi__err(x,y)  0
1072 #elif defined(STBI_FAILURE_USERMSG)
1073    #define stbi__err(x,y)  stbi__err(y)
1074 #else
1075    #define stbi__err(x,y)  stbi__err(x)
1076 #endif
1077
1078 #define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
1079 #define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
1080
1081 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
1082 {
1083    STBI_FREE(retval_from_stbi_load);
1084 }
1085
1086 #ifndef STBI_NO_LINEAR
1087 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
1088 #endif
1089
1090 #ifndef STBI_NO_HDR
1091 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
1092 #endif
1093
1094 static int stbi__vertically_flip_on_load_global = 0;
1095
1096 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
1097 {
1098    stbi__vertically_flip_on_load_global = flag_true_if_should_flip;
1099 }
1100
1101 #ifndef STBI_THREAD_LOCAL
1102 #define stbi__vertically_flip_on_load  stbi__vertically_flip_on_load_global
1103 #else
1104 static STBI_THREAD_LOCAL int stbi__vertically_flip_on_load_local, stbi__vertically_flip_on_load_set;
1105
1106 STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip)
1107 {
1108    stbi__vertically_flip_on_load_local = flag_true_if_should_flip;
1109    stbi__vertically_flip_on_load_set = 1;
1110 }
1111
1112 #define stbi__vertically_flip_on_load  (stbi__vertically_flip_on_load_set       \
1113                                          ? stbi__vertically_flip_on_load_local  \
1114                                          : stbi__vertically_flip_on_load_global)
1115 #endif // STBI_THREAD_LOCAL
1116
1117 static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
1118 {
1119    memset(ri, 0, sizeof(*ri)); // make sure it's initialized if we add new fields
1120    ri->bits_per_channel = 8; // default is 8 so most paths don't have to be changed
1121    ri->channel_order = STBI_ORDER_RGB; // all current input & output are this, but this is here so we can add BGR order
1122    ri->num_channels = 0;
1123
1124    // test the formats with a very explicit header first (at least a FOURCC
1125    // or distinctive magic number first)
1126    #ifndef STBI_NO_PNG
1127    if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp, ri);
1128    #endif
1129    #ifndef STBI_NO_BMP
1130    if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp, ri);
1131    #endif
1132    #ifndef STBI_NO_GIF
1133    if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp, ri);
1134    #endif
1135    #ifndef STBI_NO_PSD
1136    if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp, ri, bpc);
1137    #else
1138    STBI_NOTUSED(bpc);
1139    #endif
1140    #ifndef STBI_NO_PIC
1141    if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp, ri);
1142    #endif
1143
1144    // then the formats that can end up attempting to load with just 1 or 2
1145    // bytes matching expectations; these are prone to false positives, so
1146    // try them later
1147    #ifndef STBI_NO_JPEG
1148    if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp, ri);
1149    #endif
1150    #ifndef STBI_NO_PNM
1151    if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp, ri);
1152    #endif
1153
1154    #ifndef STBI_NO_HDR
1155    if (stbi__hdr_test(s)) {
1156       float *hdr = stbi__hdr_load(s, x,y,comp,req_comp, ri);
1157       return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
1158    }
1159    #endif
1160
1161    #ifndef STBI_NO_TGA
1162    // test tga last because it's a crappy test!
1163    if (stbi__tga_test(s))
1164       return stbi__tga_load(s,x,y,comp,req_comp, ri);
1165    #endif
1166
1167    return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
1168 }
1169
1170 static stbi_uc *stbi__convert_16_to_8(stbi__uint16 *orig, int w, int h, int channels)
1171 {
1172    int i;
1173    int img_len = w * h * channels;
1174    stbi_uc *reduced;
1175
1176    reduced = (stbi_uc *) stbi__malloc(img_len);
1177    if (reduced == NULL) return stbi__errpuc("outofmem", "Out of memory");
1178
1179    for (i = 0; i < img_len; ++i)
1180       reduced[i] = (stbi_uc)((orig[i] >> 8) & 0xFF); // top half of each byte is sufficient approx of 16->8 bit scaling
1181
1182    STBI_FREE(orig);
1183    return reduced;
1184 }
1185
1186 static stbi__uint16 *stbi__convert_8_to_16(stbi_uc *orig, int w, int h, int channels)
1187 {
1188    int i;
1189    int img_len = w * h * channels;
1190    stbi__uint16 *enlarged;
1191
1192    enlarged = (stbi__uint16 *) stbi__malloc(img_len*2);
1193    if (enlarged == NULL) return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
1194
1195    for (i = 0; i < img_len; ++i)
1196       enlarged[i] = (stbi__uint16)((orig[i] << 8) + orig[i]); // replicate to high and low byte, maps 0->0, 255->0xffff
1197
1198    STBI_FREE(orig);
1199    return enlarged;
1200 }
1201
1202 static void stbi__vertical_flip(void *image, int w, int h, int bytes_per_pixel)
1203 {
1204    int row;
1205    size_t bytes_per_row = (size_t)w * bytes_per_pixel;
1206    stbi_uc temp[2048];
1207    stbi_uc *bytes = (stbi_uc *)image;
1208
1209    for (row = 0; row < (h>>1); row++) {
1210       stbi_uc *row0 = bytes + row*bytes_per_row;
1211       stbi_uc *row1 = bytes + (h - row - 1)*bytes_per_row;
1212       // swap row0 with row1
1213       size_t bytes_left = bytes_per_row;
1214       while (bytes_left) {
1215          size_t bytes_copy = (bytes_left < sizeof(temp)) ? bytes_left : sizeof(temp);
1216          memcpy(temp, row0, bytes_copy);
1217          memcpy(row0, row1, bytes_copy);
1218          memcpy(row1, temp, bytes_copy);
1219          row0 += bytes_copy;
1220          row1 += bytes_copy;
1221          bytes_left -= bytes_copy;
1222       }
1223    }
1224 }
1225
1226 #ifndef STBI_NO_GIF
1227 static void stbi__vertical_flip_slices(void *image, int w, int h, int z, int bytes_per_pixel)
1228 {
1229    int slice;
1230    int slice_size = w * h * bytes_per_pixel;
1231
1232    stbi_uc *bytes = (stbi_uc *)image;
1233    for (slice = 0; slice < z; ++slice) {
1234       stbi__vertical_flip(bytes, w, h, bytes_per_pixel);
1235       bytes += slice_size;
1236    }
1237 }
1238 #endif
1239
1240 static unsigned char *stbi__load_and_postprocess_8bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1241 {
1242    stbi__result_info ri;
1243    void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 8);
1244
1245    if (result == NULL)
1246       return NULL;
1247
1248    // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
1249    STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
1250
1251    if (ri.bits_per_channel != 8) {
1252       result = stbi__convert_16_to_8((stbi__uint16 *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
1253       ri.bits_per_channel = 8;
1254    }
1255
1256    // @TODO: move stbi__convert_format to here
1257
1258    if (stbi__vertically_flip_on_load) {
1259       int channels = req_comp ? req_comp : *comp;
1260       stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi_uc));
1261    }
1262
1263    return (unsigned char *) result;
1264 }
1265
1266 static stbi__uint16 *stbi__load_and_postprocess_16bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1267 {
1268    stbi__result_info ri;
1269    void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 16);
1270
1271    if (result == NULL)
1272       return NULL;
1273
1274    // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
1275    STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
1276
1277    if (ri.bits_per_channel != 16) {
1278       result = stbi__convert_8_to_16((stbi_uc *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
1279       ri.bits_per_channel = 16;
1280    }
1281
1282    // @TODO: move stbi__convert_format16 to here
1283    // @TODO: special case RGB-to-Y (and RGBA-to-YA) for 8-bit-to-16-bit case to keep more precision
1284
1285    if (stbi__vertically_flip_on_load) {
1286       int channels = req_comp ? req_comp : *comp;
1287       stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi__uint16));
1288    }
1289
1290    return (stbi__uint16 *) result;
1291 }
1292
1293 #if !defined(STBI_NO_HDR) && !defined(STBI_NO_LINEAR)
1294 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
1295 {
1296    if (stbi__vertically_flip_on_load && result != NULL) {
1297       int channels = req_comp ? req_comp : *comp;
1298       stbi__vertical_flip(result, *x, *y, channels * sizeof(float));
1299    }
1300 }
1301 #endif
1302
1303 #ifndef STBI_NO_STDIO
1304
1305 #if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
1306 STBI_EXTERN __declspec(dllimport) int __stdcall MultiByteToWideChar(unsigned int cp, unsigned long flags, const char *str, int cbmb, wchar_t *widestr, int cchwide);
1307 STBI_EXTERN __declspec(dllimport) int __stdcall WideCharToMultiByte(unsigned int cp, unsigned long flags, const wchar_t *widestr, int cchwide, char *str, int cbmb, const char *defchar, int *used_default);
1308 #endif
1309
1310 #if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
1311 STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input)
1312 {
1313         return WideCharToMultiByte(65001 /* UTF8 */, 0, input, -1, buffer, (int) bufferlen, NULL, NULL);
1314 }
1315 #endif
1316
1317 static FILE *stbi__fopen(char const *filename, char const *mode)
1318 {
1319    FILE *f;
1320 #if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
1321    wchar_t wMode[64];
1322    wchar_t wFilename[1024];
1323         if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, filename, -1, wFilename, sizeof(wFilename)/sizeof(*wFilename)))
1324       return 0;
1325
1326         if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, mode, -1, wMode, sizeof(wMode)/sizeof(*wMode)))
1327       return 0;
1328
1329 #if defined(_MSC_VER) && _MSC_VER >= 1400
1330         if (0 != _wfopen_s(&f, wFilename, wMode))
1331                 f = 0;
1332 #else
1333    f = _wfopen(wFilename, wMode);
1334 #endif
1335
1336 #elif defined(_MSC_VER) && _MSC_VER >= 1400
1337    if (0 != fopen_s(&f, filename, mode))
1338       f=0;
1339 #else
1340    f = fopen(filename, mode);
1341 #endif
1342    return f;
1343 }
1344
1345
1346 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
1347 {
1348    FILE *f = stbi__fopen(filename, "rb");
1349    unsigned char *result;
1350    if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
1351    result = stbi_load_from_file(f,x,y,comp,req_comp);
1352    fclose(f);
1353    return result;
1354 }
1355
1356 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1357 {
1358    unsigned char *result;
1359    stbi__context s;
1360    stbi__start_file(&s,f);
1361    result = stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1362    if (result) {
1363       // need to 'unget' all the characters in the IO buffer
1364       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1365    }
1366    return result;
1367 }
1368
1369 STBIDEF stbi__uint16 *stbi_load_from_file_16(FILE *f, int *x, int *y, int *comp, int req_comp)
1370 {
1371    stbi__uint16 *result;
1372    stbi__context s;
1373    stbi__start_file(&s,f);
1374    result = stbi__load_and_postprocess_16bit(&s,x,y,comp,req_comp);
1375    if (result) {
1376       // need to 'unget' all the characters in the IO buffer
1377       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1378    }
1379    return result;
1380 }
1381
1382 STBIDEF stbi_us *stbi_load_16(char const *filename, int *x, int *y, int *comp, int req_comp)
1383 {
1384    FILE *f = stbi__fopen(filename, "rb");
1385    stbi__uint16 *result;
1386    if (!f) return (stbi_us *) stbi__errpuc("can't fopen", "Unable to open file");
1387    result = stbi_load_from_file_16(f,x,y,comp,req_comp);
1388    fclose(f);
1389    return result;
1390 }
1391
1392
1393 #endif //!STBI_NO_STDIO
1394
1395 STBIDEF stbi_us *stbi_load_16_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels)
1396 {
1397    stbi__context s;
1398    stbi__start_mem(&s,buffer,len);
1399    return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
1400 }
1401
1402 STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels)
1403 {
1404    stbi__context s;
1405    stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
1406    return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
1407 }
1408
1409 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1410 {
1411    stbi__context s;
1412    stbi__start_mem(&s,buffer,len);
1413    return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1414 }
1415
1416 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1417 {
1418    stbi__context s;
1419    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1420    return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1421 }
1422
1423 #ifndef STBI_NO_GIF
1424 STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
1425 {
1426    unsigned char *result;
1427    stbi__context s;
1428    stbi__start_mem(&s,buffer,len);
1429
1430    result = (unsigned char*) stbi__load_gif_main(&s, delays, x, y, z, comp, req_comp);
1431    if (stbi__vertically_flip_on_load) {
1432       stbi__vertical_flip_slices( result, *x, *y, *z, *comp );
1433    }
1434
1435    return result;
1436 }
1437 #endif
1438
1439 #ifndef STBI_NO_LINEAR
1440 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1441 {
1442    unsigned char *data;
1443    #ifndef STBI_NO_HDR
1444    if (stbi__hdr_test(s)) {
1445       stbi__result_info ri;
1446       float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp, &ri);
1447       if (hdr_data)
1448          stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
1449       return hdr_data;
1450    }
1451    #endif
1452    data = stbi__load_and_postprocess_8bit(s, x, y, comp, req_comp);
1453    if (data)
1454       return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
1455    return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
1456 }
1457
1458 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1459 {
1460    stbi__context s;
1461    stbi__start_mem(&s,buffer,len);
1462    return stbi__loadf_main(&s,x,y,comp,req_comp);
1463 }
1464
1465 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1466 {
1467    stbi__context s;
1468    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1469    return stbi__loadf_main(&s,x,y,comp,req_comp);
1470 }
1471
1472 #ifndef STBI_NO_STDIO
1473 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
1474 {
1475    float *result;
1476    FILE *f = stbi__fopen(filename, "rb");
1477    if (!f) return stbi__errpf("can't fopen", "Unable to open file");
1478    result = stbi_loadf_from_file(f,x,y,comp,req_comp);
1479    fclose(f);
1480    return result;
1481 }
1482
1483 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1484 {
1485    stbi__context s;
1486    stbi__start_file(&s,f);
1487    return stbi__loadf_main(&s,x,y,comp,req_comp);
1488 }
1489 #endif // !STBI_NO_STDIO
1490
1491 #endif // !STBI_NO_LINEAR
1492
1493 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1494 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1495 // reports false!
1496
1497 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
1498 {
1499    #ifndef STBI_NO_HDR
1500    stbi__context s;
1501    stbi__start_mem(&s,buffer,len);
1502    return stbi__hdr_test(&s);
1503    #else
1504    STBI_NOTUSED(buffer);
1505    STBI_NOTUSED(len);
1506    return 0;
1507    #endif
1508 }
1509
1510 #ifndef STBI_NO_STDIO
1511 STBIDEF int      stbi_is_hdr          (char const *filename)
1512 {
1513    FILE *f = stbi__fopen(filename, "rb");
1514    int result=0;
1515    if (f) {
1516       result = stbi_is_hdr_from_file(f);
1517       fclose(f);
1518    }
1519    return result;
1520 }
1521
1522 STBIDEF int stbi_is_hdr_from_file(FILE *f)
1523 {
1524    #ifndef STBI_NO_HDR
1525    long pos = ftell(f);
1526    int res;
1527    stbi__context s;
1528    stbi__start_file(&s,f);
1529    res = stbi__hdr_test(&s);
1530    fseek(f, pos, SEEK_SET);
1531    return res;
1532    #else
1533    STBI_NOTUSED(f);
1534    return 0;
1535    #endif
1536 }
1537 #endif // !STBI_NO_STDIO
1538
1539 STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
1540 {
1541    #ifndef STBI_NO_HDR
1542    stbi__context s;
1543    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1544    return stbi__hdr_test(&s);
1545    #else
1546    STBI_NOTUSED(clbk);
1547    STBI_NOTUSED(user);
1548    return 0;
1549    #endif
1550 }
1551
1552 #ifndef STBI_NO_LINEAR
1553 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
1554
1555 STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
1556 STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
1557 #endif
1558
1559 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
1560
1561 STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
1562 STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
1563
1564
1565 //////////////////////////////////////////////////////////////////////////////
1566 //
1567 // Common code used by all image loaders
1568 //
1569
1570 enum
1571 {
1572    STBI__SCAN_load=0,
1573    STBI__SCAN_type,
1574    STBI__SCAN_header
1575 };
1576
1577 static void stbi__refill_buffer(stbi__context *s)
1578 {
1579    int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
1580    s->callback_already_read += (int) (s->img_buffer - s->img_buffer_original);
1581    if (n == 0) {
1582       // at end of file, treat same as if from memory, but need to handle case
1583       // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1584       s->read_from_callbacks = 0;
1585       s->img_buffer = s->buffer_start;
1586       s->img_buffer_end = s->buffer_start+1;
1587       *s->img_buffer = 0;
1588    } else {
1589       s->img_buffer = s->buffer_start;
1590       s->img_buffer_end = s->buffer_start + n;
1591    }
1592 }
1593
1594 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
1595 {
1596    if (s->img_buffer < s->img_buffer_end)
1597       return *s->img_buffer++;
1598    if (s->read_from_callbacks) {
1599       stbi__refill_buffer(s);
1600       return *s->img_buffer++;
1601    }
1602    return 0;
1603 }
1604
1605 #if defined(STBI_NO_JPEG) && defined(STBI_NO_HDR) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
1606 // nothing
1607 #else
1608 stbi_inline static int stbi__at_eof(stbi__context *s)
1609 {
1610    if (s->io.read) {
1611       if (!(s->io.eof)(s->io_user_data)) return 0;
1612       // if feof() is true, check if buffer = end
1613       // special case: we've only got the special 0 character at the end
1614       if (s->read_from_callbacks == 0) return 1;
1615    }
1616
1617    return s->img_buffer >= s->img_buffer_end;
1618 }
1619 #endif
1620
1621 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC)
1622 // nothing
1623 #else
1624 static void stbi__skip(stbi__context *s, int n)
1625 {
1626    if (n == 0) return;  // already there!
1627    if (n < 0) {
1628       s->img_buffer = s->img_buffer_end;
1629       return;
1630    }
1631    if (s->io.read) {
1632       int blen = (int) (s->img_buffer_end - s->img_buffer);
1633       if (blen < n) {
1634          s->img_buffer = s->img_buffer_end;
1635          (s->io.skip)(s->io_user_data, n - blen);
1636          return;
1637       }
1638    }
1639    s->img_buffer += n;
1640 }
1641 #endif
1642
1643 #if defined(STBI_NO_PNG) && defined(STBI_NO_TGA) && defined(STBI_NO_HDR) && defined(STBI_NO_PNM)
1644 // nothing
1645 #else
1646 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
1647 {
1648    if (s->io.read) {
1649       int blen = (int) (s->img_buffer_end - s->img_buffer);
1650       if (blen < n) {
1651          int res, count;
1652
1653          memcpy(buffer, s->img_buffer, blen);
1654
1655          count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
1656          res = (count == (n-blen));
1657          s->img_buffer = s->img_buffer_end;
1658          return res;
1659       }
1660    }
1661
1662    if (s->img_buffer+n <= s->img_buffer_end) {
1663       memcpy(buffer, s->img_buffer, n);
1664       s->img_buffer += n;
1665       return 1;
1666    } else
1667       return 0;
1668 }
1669 #endif
1670
1671 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
1672 // nothing
1673 #else
1674 static int stbi__get16be(stbi__context *s)
1675 {
1676    int z = stbi__get8(s);
1677    return (z << 8) + stbi__get8(s);
1678 }
1679 #endif
1680
1681 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
1682 // nothing
1683 #else
1684 static stbi__uint32 stbi__get32be(stbi__context *s)
1685 {
1686    stbi__uint32 z = stbi__get16be(s);
1687    return (z << 16) + stbi__get16be(s);
1688 }
1689 #endif
1690
1691 #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
1692 // nothing
1693 #else
1694 static int stbi__get16le(stbi__context *s)
1695 {
1696    int z = stbi__get8(s);
1697    return z + (stbi__get8(s) << 8);
1698 }
1699 #endif
1700
1701 #ifndef STBI_NO_BMP
1702 static stbi__uint32 stbi__get32le(stbi__context *s)
1703 {
1704    stbi__uint32 z = stbi__get16le(s);
1705    z += (stbi__uint32)stbi__get16le(s) << 16;
1706    return z;
1707 }
1708 #endif
1709
1710 #define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
1711
1712 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
1713 // nothing
1714 #else
1715 //////////////////////////////////////////////////////////////////////////////
1716 //
1717 //  generic converter from built-in img_n to req_comp
1718 //    individual types do this automatically as much as possible (e.g. jpeg
1719 //    does all cases internally since it needs to colorspace convert anyway,
1720 //    and it never has alpha, so very few cases ). png can automatically
1721 //    interleave an alpha=255 channel, but falls back to this for other cases
1722 //
1723 //  assume data buffer is malloced, so malloc a new one and free that one
1724 //  only failure mode is malloc failing
1725
1726 static stbi_uc stbi__compute_y(int r, int g, int b)
1727 {
1728    return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
1729 }
1730 #endif
1731
1732 #if defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
1733 // nothing
1734 #else
1735 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1736 {
1737    int i,j;
1738    unsigned char *good;
1739
1740    if (req_comp == img_n) return data;
1741    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1742
1743    good = (unsigned char *) stbi__malloc_mad3(req_comp, x, y, 0);
1744    if (good == NULL) {
1745       STBI_FREE(data);
1746       return stbi__errpuc("outofmem", "Out of memory");
1747    }
1748
1749    for (j=0; j < (int) y; ++j) {
1750       unsigned char *src  = data + j * x * img_n   ;
1751       unsigned char *dest = good + j * x * req_comp;
1752
1753       #define STBI__COMBO(a,b)  ((a)*8+(b))
1754       #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1755       // convert source image with img_n components to one with req_comp components;
1756       // avoid switch per pixel, so use switch per scanline and massive macros
1757       switch (STBI__COMBO(img_n, req_comp)) {
1758          STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=255;                                     } break;
1759          STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
1760          STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=255;                     } break;
1761          STBI__CASE(2,1) { dest[0]=src[0];                                                  } break;
1762          STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
1763          STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                  } break;
1764          STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=255;        } break;
1765          STBI__CASE(3,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
1766          STBI__CASE(3,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = 255;    } break;
1767          STBI__CASE(4,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
1768          STBI__CASE(4,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = src[3]; } break;
1769          STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                    } break;
1770          default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return stbi__errpuc("unsupported", "Unsupported format conversion");
1771       }
1772       #undef STBI__CASE
1773    }
1774
1775    STBI_FREE(data);
1776    return good;
1777 }
1778 #endif
1779
1780 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
1781 // nothing
1782 #else
1783 static stbi__uint16 stbi__compute_y_16(int r, int g, int b)
1784 {
1785    return (stbi__uint16) (((r*77) + (g*150) +  (29*b)) >> 8);
1786 }
1787 #endif
1788
1789 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
1790 // nothing
1791 #else
1792 static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1793 {
1794    int i,j;
1795    stbi__uint16 *good;
1796
1797    if (req_comp == img_n) return data;
1798    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1799
1800    good = (stbi__uint16 *) stbi__malloc(req_comp * x * y * 2);
1801    if (good == NULL) {
1802       STBI_FREE(data);
1803       return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
1804    }
1805
1806    for (j=0; j < (int) y; ++j) {
1807       stbi__uint16 *src  = data + j * x * img_n   ;
1808       stbi__uint16 *dest = good + j * x * req_comp;
1809
1810       #define STBI__COMBO(a,b)  ((a)*8+(b))
1811       #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1812       // convert source image with img_n components to one with req_comp components;
1813       // avoid switch per pixel, so use switch per scanline and massive macros
1814       switch (STBI__COMBO(img_n, req_comp)) {
1815          STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=0xffff;                                     } break;
1816          STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
1817          STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=0xffff;                     } break;
1818          STBI__CASE(2,1) { dest[0]=src[0];                                                     } break;
1819          STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
1820          STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                     } break;
1821          STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=0xffff;        } break;
1822          STBI__CASE(3,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
1823          STBI__CASE(3,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = 0xffff; } break;
1824          STBI__CASE(4,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
1825          STBI__CASE(4,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = src[3]; } break;
1826          STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                       } break;
1827          default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return (stbi__uint16*) stbi__errpuc("unsupported", "Unsupported format conversion");
1828       }
1829       #undef STBI__CASE
1830    }
1831
1832    STBI_FREE(data);
1833    return good;
1834 }
1835 #endif
1836
1837 #ifndef STBI_NO_LINEAR
1838 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
1839 {
1840    int i,k,n;
1841    float *output;
1842    if (!data) return NULL;
1843    output = (float *) stbi__malloc_mad4(x, y, comp, sizeof(float), 0);
1844    if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
1845    // compute number of non-alpha components
1846    if (comp & 1) n = comp; else n = comp-1;
1847    for (i=0; i < x*y; ++i) {
1848       for (k=0; k < n; ++k) {
1849          output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
1850       }
1851    }
1852    if (n < comp) {
1853       for (i=0; i < x*y; ++i) {
1854          output[i*comp + n] = data[i*comp + n]/255.0f;
1855       }
1856    }
1857    STBI_FREE(data);
1858    return output;
1859 }
1860 #endif
1861
1862 #ifndef STBI_NO_HDR
1863 #define stbi__float2int(x)   ((int) (x))
1864 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
1865 {
1866    int i,k,n;
1867    stbi_uc *output;
1868    if (!data) return NULL;
1869    output = (stbi_uc *) stbi__malloc_mad3(x, y, comp, 0);
1870    if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
1871    // compute number of non-alpha components
1872    if (comp & 1) n = comp; else n = comp-1;
1873    for (i=0; i < x*y; ++i) {
1874       for (k=0; k < n; ++k) {
1875          float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
1876          if (z < 0) z = 0;
1877          if (z > 255) z = 255;
1878          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1879       }
1880       if (k < comp) {
1881          float z = data[i*comp+k] * 255 + 0.5f;
1882          if (z < 0) z = 0;
1883          if (z > 255) z = 255;
1884          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1885       }
1886    }
1887    STBI_FREE(data);
1888    return output;
1889 }
1890 #endif
1891
1892 //////////////////////////////////////////////////////////////////////////////
1893 //
1894 //  "baseline" JPEG/JFIF decoder
1895 //
1896 //    simple implementation
1897 //      - doesn't support delayed output of y-dimension
1898 //      - simple interface (only one output format: 8-bit interleaved RGB)
1899 //      - doesn't try to recover corrupt jpegs
1900 //      - doesn't allow partial loading, loading multiple at once
1901 //      - still fast on x86 (copying globals into locals doesn't help x86)
1902 //      - allocates lots of intermediate memory (full size of all components)
1903 //        - non-interleaved case requires this anyway
1904 //        - allows good upsampling (see next)
1905 //    high-quality
1906 //      - upsampled channels are bilinearly interpolated, even across blocks
1907 //      - quality integer IDCT derived from IJG's 'slow'
1908 //    performance
1909 //      - fast huffman; reasonable integer IDCT
1910 //      - some SIMD kernels for common paths on targets with SSE2/NEON
1911 //      - uses a lot of intermediate memory, could cache poorly
1912
1913 #ifndef STBI_NO_JPEG
1914
1915 // huffman decoding acceleration
1916 #define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
1917
1918 typedef struct
1919 {
1920    stbi_uc  fast[1 << FAST_BITS];
1921    // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
1922    stbi__uint16 code[256];
1923    stbi_uc  values[256];
1924    stbi_uc  size[257];
1925    unsigned int maxcode[18];
1926    int    delta[17];   // old 'firstsymbol' - old 'firstcode'
1927 } stbi__huffman;
1928
1929 typedef struct
1930 {
1931    stbi__context *s;
1932    stbi__huffman huff_dc[4];
1933    stbi__huffman huff_ac[4];
1934    stbi__uint16 dequant[4][64];
1935    stbi__int16 fast_ac[4][1 << FAST_BITS];
1936
1937 // sizes for components, interleaved MCUs
1938    int img_h_max, img_v_max;
1939    int img_mcu_x, img_mcu_y;
1940    int img_mcu_w, img_mcu_h;
1941
1942 // definition of jpeg image component
1943    struct
1944    {
1945       int id;
1946       int h,v;
1947       int tq;
1948       int hd,ha;
1949       int dc_pred;
1950
1951       int x,y,w2,h2;
1952       stbi_uc *data;
1953       void *raw_data, *raw_coeff;
1954       stbi_uc *linebuf;
1955       short   *coeff;   // progressive only
1956       int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
1957    } img_comp[4];
1958
1959    stbi__uint32   code_buffer; // jpeg entropy-coded buffer
1960    int            code_bits;   // number of valid bits
1961    unsigned char  marker;      // marker seen while filling entropy buffer
1962    int            nomore;      // flag if we saw a marker so must stop
1963
1964    int            progressive;
1965    int            spec_start;
1966    int            spec_end;
1967    int            succ_high;
1968    int            succ_low;
1969    int            eob_run;
1970    int            jfif;
1971    int            app14_color_transform; // Adobe APP14 tag
1972    int            rgb;
1973
1974    int scan_n, order[4];
1975    int restart_interval, todo;
1976
1977 // kernels
1978    void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
1979    void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
1980    stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
1981 } stbi__jpeg;
1982
1983 static int stbi__build_huffman(stbi__huffman *h, int *count)
1984 {
1985    int i,j,k=0;
1986    unsigned int code;
1987    // build size list for each symbol (from JPEG spec)
1988    for (i=0; i < 16; ++i)
1989       for (j=0; j < count[i]; ++j)
1990          h->size[k++] = (stbi_uc) (i+1);
1991    h->size[k] = 0;
1992
1993    // compute actual symbols (from jpeg spec)
1994    code = 0;
1995    k = 0;
1996    for(j=1; j <= 16; ++j) {
1997       // compute delta to add to code to compute symbol id
1998       h->delta[j] = k - code;
1999       if (h->size[k] == j) {
2000          while (h->size[k] == j)
2001             h->code[k++] = (stbi__uint16) (code++);
2002          if (code-1 >= (1u << j)) return stbi__err("bad code lengths","Corrupt JPEG");
2003       }
2004       // compute largest code + 1 for this size, preshifted as needed later
2005       h->maxcode[j] = code << (16-j);
2006       code <<= 1;
2007    }
2008    h->maxcode[j] = 0xffffffff;
2009
2010    // build non-spec acceleration table; 255 is flag for not-accelerated
2011    memset(h->fast, 255, 1 << FAST_BITS);
2012    for (i=0; i < k; ++i) {
2013       int s = h->size[i];
2014       if (s <= FAST_BITS) {
2015          int c = h->code[i] << (FAST_BITS-s);
2016          int m = 1 << (FAST_BITS-s);
2017          for (j=0; j < m; ++j) {
2018             h->fast[c+j] = (stbi_uc) i;
2019          }
2020       }
2021    }
2022    return 1;
2023 }
2024
2025 // build a table that decodes both magnitude and value of small ACs in
2026 // one go.
2027 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
2028 {
2029    int i;
2030    for (i=0; i < (1 << FAST_BITS); ++i) {
2031       stbi_uc fast = h->fast[i];
2032       fast_ac[i] = 0;
2033       if (fast < 255) {
2034          int rs = h->values[fast];
2035          int run = (rs >> 4) & 15;
2036          int magbits = rs & 15;
2037          int len = h->size[fast];
2038
2039          if (magbits && len + magbits <= FAST_BITS) {
2040             // magnitude code followed by receive_extend code
2041             int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
2042             int m = 1 << (magbits - 1);
2043             if (k < m) k += (~0U << magbits) + 1;
2044             // if the result is small enough, we can fit it in fast_ac table
2045             if (k >= -128 && k <= 127)
2046                fast_ac[i] = (stbi__int16) ((k * 256) + (run * 16) + (len + magbits));
2047          }
2048       }
2049    }
2050 }
2051
2052 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
2053 {
2054    do {
2055       unsigned int b = j->nomore ? 0 : stbi__get8(j->s);
2056       if (b == 0xff) {
2057          int c = stbi__get8(j->s);
2058          while (c == 0xff) c = stbi__get8(j->s); // consume fill bytes
2059          if (c != 0) {
2060             j->marker = (unsigned char) c;
2061             j->nomore = 1;
2062             return;
2063          }
2064       }
2065       j->code_buffer |= b << (24 - j->code_bits);
2066       j->code_bits += 8;
2067    } while (j->code_bits <= 24);
2068 }
2069
2070 // (1 << n) - 1
2071 static const stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
2072
2073 // decode a jpeg huffman value from the bitstream
2074 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
2075 {
2076    unsigned int temp;
2077    int c,k;
2078
2079    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2080
2081    // look at the top FAST_BITS and determine what symbol ID it is,
2082    // if the code is <= FAST_BITS
2083    c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2084    k = h->fast[c];
2085    if (k < 255) {
2086       int s = h->size[k];
2087       if (s > j->code_bits)
2088          return -1;
2089       j->code_buffer <<= s;
2090       j->code_bits -= s;
2091       return h->values[k];
2092    }
2093
2094    // naive test is to shift the code_buffer down so k bits are
2095    // valid, then test against maxcode. To speed this up, we've
2096    // preshifted maxcode left so that it has (16-k) 0s at the
2097    // end; in other words, regardless of the number of bits, it
2098    // wants to be compared against something shifted to have 16;
2099    // that way we don't need to shift inside the loop.
2100    temp = j->code_buffer >> 16;
2101    for (k=FAST_BITS+1 ; ; ++k)
2102       if (temp < h->maxcode[k])
2103          break;
2104    if (k == 17) {
2105       // error! code not found
2106       j->code_bits -= 16;
2107       return -1;
2108    }
2109
2110    if (k > j->code_bits)
2111       return -1;
2112
2113    // convert the huffman code to the symbol id
2114    c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
2115    STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
2116
2117    // convert the id to a symbol
2118    j->code_bits -= k;
2119    j->code_buffer <<= k;
2120    return h->values[c];
2121 }
2122
2123 // bias[n] = (-1<<n) + 1
2124 static const int stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
2125
2126 // combined JPEG 'receive' and JPEG 'extend', since baseline
2127 // always extends everything it receives.
2128 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
2129 {
2130    unsigned int k;
2131    int sgn;
2132    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
2133
2134    sgn = j->code_buffer >> 31; // sign bit always in MSB; 0 if MSB clear (positive), 1 if MSB set (negative)
2135    k = stbi_lrot(j->code_buffer, n);
2136    j->code_buffer = k & ~stbi__bmask[n];
2137    k &= stbi__bmask[n];
2138    j->code_bits -= n;
2139    return k + (stbi__jbias[n] & (sgn - 1));
2140 }
2141
2142 // get some unsigned bits
2143 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
2144 {
2145    unsigned int k;
2146    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
2147    k = stbi_lrot(j->code_buffer, n);
2148    j->code_buffer = k & ~stbi__bmask[n];
2149    k &= stbi__bmask[n];
2150    j->code_bits -= n;
2151    return k;
2152 }
2153
2154 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
2155 {
2156    unsigned int k;
2157    if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
2158    k = j->code_buffer;
2159    j->code_buffer <<= 1;
2160    --j->code_bits;
2161    return k & 0x80000000;
2162 }
2163
2164 // given a value that's at position X in the zigzag stream,
2165 // where does it appear in the 8x8 matrix coded as row-major?
2166 static const stbi_uc stbi__jpeg_dezigzag[64+15] =
2167 {
2168     0,  1,  8, 16,  9,  2,  3, 10,
2169    17, 24, 32, 25, 18, 11,  4,  5,
2170    12, 19, 26, 33, 40, 48, 41, 34,
2171    27, 20, 13,  6,  7, 14, 21, 28,
2172    35, 42, 49, 56, 57, 50, 43, 36,
2173    29, 22, 15, 23, 30, 37, 44, 51,
2174    58, 59, 52, 45, 38, 31, 39, 46,
2175    53, 60, 61, 54, 47, 55, 62, 63,
2176    // let corrupt input sample past end
2177    63, 63, 63, 63, 63, 63, 63, 63,
2178    63, 63, 63, 63, 63, 63, 63
2179 };
2180
2181 // decode one 64-entry block--
2182 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi__uint16 *dequant)
2183 {
2184    int diff,dc,k;
2185    int t;
2186
2187    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2188    t = stbi__jpeg_huff_decode(j, hdc);
2189    if (t < 0 || t > 15) return stbi__err("bad huffman code","Corrupt JPEG");
2190
2191    // 0 all the ac values now so we can do it 32-bits at a time
2192    memset(data,0,64*sizeof(data[0]));
2193
2194    diff = t ? stbi__extend_receive(j, t) : 0;
2195    dc = j->img_comp[b].dc_pred + diff;
2196    j->img_comp[b].dc_pred = dc;
2197    data[0] = (short) (dc * dequant[0]);
2198
2199    // decode AC components, see JPEG spec
2200    k = 1;
2201    do {
2202       unsigned int zig;
2203       int c,r,s;
2204       if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2205       c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2206       r = fac[c];
2207       if (r) { // fast-AC path
2208          k += (r >> 4) & 15; // run
2209          s = r & 15; // combined length
2210          j->code_buffer <<= s;
2211          j->code_bits -= s;
2212          // decode into unzigzag'd location
2213          zig = stbi__jpeg_dezigzag[k++];
2214          data[zig] = (short) ((r >> 8) * dequant[zig]);
2215       } else {
2216          int rs = stbi__jpeg_huff_decode(j, hac);
2217          if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2218          s = rs & 15;
2219          r = rs >> 4;
2220          if (s == 0) {
2221             if (rs != 0xf0) break; // end block
2222             k += 16;
2223          } else {
2224             k += r;
2225             // decode into unzigzag'd location
2226             zig = stbi__jpeg_dezigzag[k++];
2227             data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
2228          }
2229       }
2230    } while (k < 64);
2231    return 1;
2232 }
2233
2234 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
2235 {
2236    int diff,dc;
2237    int t;
2238    if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2239
2240    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2241
2242    if (j->succ_high == 0) {
2243       // first scan for DC coefficient, must be first
2244       memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
2245       t = stbi__jpeg_huff_decode(j, hdc);
2246       if (t < 0 || t > 15) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2247       diff = t ? stbi__extend_receive(j, t) : 0;
2248
2249       dc = j->img_comp[b].dc_pred + diff;
2250       j->img_comp[b].dc_pred = dc;
2251       data[0] = (short) (dc * (1 << j->succ_low));
2252    } else {
2253       // refinement scan for DC coefficient
2254       if (stbi__jpeg_get_bit(j))
2255          data[0] += (short) (1 << j->succ_low);
2256    }
2257    return 1;
2258 }
2259
2260 // @OPTIMIZE: store non-zigzagged during the decode passes,
2261 // and only de-zigzag when dequantizing
2262 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
2263 {
2264    int k;
2265    if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2266
2267    if (j->succ_high == 0) {
2268       int shift = j->succ_low;
2269
2270       if (j->eob_run) {
2271          --j->eob_run;
2272          return 1;
2273       }
2274
2275       k = j->spec_start;
2276       do {
2277          unsigned int zig;
2278          int c,r,s;
2279          if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2280          c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2281          r = fac[c];
2282          if (r) { // fast-AC path
2283             k += (r >> 4) & 15; // run
2284             s = r & 15; // combined length
2285             j->code_buffer <<= s;
2286             j->code_bits -= s;
2287             zig = stbi__jpeg_dezigzag[k++];
2288             data[zig] = (short) ((r >> 8) * (1 << shift));
2289          } else {
2290             int rs = stbi__jpeg_huff_decode(j, hac);
2291             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2292             s = rs & 15;
2293             r = rs >> 4;
2294             if (s == 0) {
2295                if (r < 15) {
2296                   j->eob_run = (1 << r);
2297                   if (r)
2298                      j->eob_run += stbi__jpeg_get_bits(j, r);
2299                   --j->eob_run;
2300                   break;
2301                }
2302                k += 16;
2303             } else {
2304                k += r;
2305                zig = stbi__jpeg_dezigzag[k++];
2306                data[zig] = (short) (stbi__extend_receive(j,s) * (1 << shift));
2307             }
2308          }
2309       } while (k <= j->spec_end);
2310    } else {
2311       // refinement scan for these AC coefficients
2312
2313       short bit = (short) (1 << j->succ_low);
2314
2315       if (j->eob_run) {
2316          --j->eob_run;
2317          for (k = j->spec_start; k <= j->spec_end; ++k) {
2318             short *p = &data[stbi__jpeg_dezigzag[k]];
2319             if (*p != 0)
2320                if (stbi__jpeg_get_bit(j))
2321                   if ((*p & bit)==0) {
2322                      if (*p > 0)
2323                         *p += bit;
2324                      else
2325                         *p -= bit;
2326                   }
2327          }
2328       } else {
2329          k = j->spec_start;
2330          do {
2331             int r,s;
2332             int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
2333             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2334             s = rs & 15;
2335             r = rs >> 4;
2336             if (s == 0) {
2337                if (r < 15) {
2338                   j->eob_run = (1 << r) - 1;
2339                   if (r)
2340                      j->eob_run += stbi__jpeg_get_bits(j, r);
2341                   r = 64; // force end of block
2342                } else {
2343                   // r=15 s=0 should write 16 0s, so we just do
2344                   // a run of 15 0s and then write s (which is 0),
2345                   // so we don't have to do anything special here
2346                }
2347             } else {
2348                if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
2349                // sign bit
2350                if (stbi__jpeg_get_bit(j))
2351                   s = bit;
2352                else
2353                   s = -bit;
2354             }
2355
2356             // advance by r
2357             while (k <= j->spec_end) {
2358                short *p = &data[stbi__jpeg_dezigzag[k++]];
2359                if (*p != 0) {
2360                   if (stbi__jpeg_get_bit(j))
2361                      if ((*p & bit)==0) {
2362                         if (*p > 0)
2363                            *p += bit;
2364                         else
2365                            *p -= bit;
2366                      }
2367                } else {
2368                   if (r == 0) {
2369                      *p = (short) s;
2370                      break;
2371                   }
2372                   --r;
2373                }
2374             }
2375          } while (k <= j->spec_end);
2376       }
2377    }
2378    return 1;
2379 }
2380
2381 // take a -128..127 value and stbi__clamp it and convert to 0..255
2382 stbi_inline static stbi_uc stbi__clamp(int x)
2383 {
2384    // trick to use a single test to catch both cases
2385    if ((unsigned int) x > 255) {
2386       if (x < 0) return 0;
2387       if (x > 255) return 255;
2388    }
2389    return (stbi_uc) x;
2390 }
2391
2392 #define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
2393 #define stbi__fsh(x)  ((x) * 4096)
2394
2395 // derived from jidctint -- DCT_ISLOW
2396 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
2397    int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
2398    p2 = s2;                                    \
2399    p3 = s6;                                    \
2400    p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
2401    t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
2402    t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
2403    p2 = s0;                                    \
2404    p3 = s4;                                    \
2405    t0 = stbi__fsh(p2+p3);                      \
2406    t1 = stbi__fsh(p2-p3);                      \
2407    x0 = t0+t3;                                 \
2408    x3 = t0-t3;                                 \
2409    x1 = t1+t2;                                 \
2410    x2 = t1-t2;                                 \
2411    t0 = s7;                                    \
2412    t1 = s5;                                    \
2413    t2 = s3;                                    \
2414    t3 = s1;                                    \
2415    p3 = t0+t2;                                 \
2416    p4 = t1+t3;                                 \
2417    p1 = t0+t3;                                 \
2418    p2 = t1+t2;                                 \
2419    p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
2420    t0 = t0*stbi__f2f( 0.298631336f);           \
2421    t1 = t1*stbi__f2f( 2.053119869f);           \
2422    t2 = t2*stbi__f2f( 3.072711026f);           \
2423    t3 = t3*stbi__f2f( 1.501321110f);           \
2424    p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
2425    p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
2426    p3 = p3*stbi__f2f(-1.961570560f);           \
2427    p4 = p4*stbi__f2f(-0.390180644f);           \
2428    t3 += p1+p4;                                \
2429    t2 += p2+p3;                                \
2430    t1 += p2+p4;                                \
2431    t0 += p1+p3;
2432
2433 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
2434 {
2435    int i,val[64],*v=val;
2436    stbi_uc *o;
2437    short *d = data;
2438
2439    // columns
2440    for (i=0; i < 8; ++i,++d, ++v) {
2441       // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
2442       if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
2443            && d[40]==0 && d[48]==0 && d[56]==0) {
2444          //    no shortcut                 0     seconds
2445          //    (1|2|3|4|5|6|7)==0          0     seconds
2446          //    all separate               -0.047 seconds
2447          //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
2448          int dcterm = d[0]*4;
2449          v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
2450       } else {
2451          STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
2452          // constants scaled things up by 1<<12; let's bring them back
2453          // down, but keep 2 extra bits of precision
2454          x0 += 512; x1 += 512; x2 += 512; x3 += 512;
2455          v[ 0] = (x0+t3) >> 10;
2456          v[56] = (x0-t3) >> 10;
2457          v[ 8] = (x1+t2) >> 10;
2458          v[48] = (x1-t2) >> 10;
2459          v[16] = (x2+t1) >> 10;
2460          v[40] = (x2-t1) >> 10;
2461          v[24] = (x3+t0) >> 10;
2462          v[32] = (x3-t0) >> 10;
2463       }
2464    }
2465
2466    for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
2467       // no fast case since the first 1D IDCT spread components out
2468       STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
2469       // constants scaled things up by 1<<12, plus we had 1<<2 from first
2470       // loop, plus horizontal and vertical each scale by sqrt(8) so together
2471       // we've got an extra 1<<3, so 1<<17 total we need to remove.
2472       // so we want to round that, which means adding 0.5 * 1<<17,
2473       // aka 65536. Also, we'll end up with -128 to 127 that we want
2474       // to encode as 0..255 by adding 128, so we'll add that before the shift
2475       x0 += 65536 + (128<<17);
2476       x1 += 65536 + (128<<17);
2477       x2 += 65536 + (128<<17);
2478       x3 += 65536 + (128<<17);
2479       // tried computing the shifts into temps, or'ing the temps to see
2480       // if any were out of range, but that was slower
2481       o[0] = stbi__clamp((x0+t3) >> 17);
2482       o[7] = stbi__clamp((x0-t3) >> 17);
2483       o[1] = stbi__clamp((x1+t2) >> 17);
2484       o[6] = stbi__clamp((x1-t2) >> 17);
2485       o[2] = stbi__clamp((x2+t1) >> 17);
2486       o[5] = stbi__clamp((x2-t1) >> 17);
2487       o[3] = stbi__clamp((x3+t0) >> 17);
2488       o[4] = stbi__clamp((x3-t0) >> 17);
2489    }
2490 }
2491
2492 #ifdef STBI_SSE2
2493 // sse2 integer IDCT. not the fastest possible implementation but it
2494 // produces bit-identical results to the generic C version so it's
2495 // fully "transparent".
2496 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2497 {
2498    // This is constructed to match our regular (generic) integer IDCT exactly.
2499    __m128i row0, row1, row2, row3, row4, row5, row6, row7;
2500    __m128i tmp;
2501
2502    // dot product constant: even elems=x, odd elems=y
2503    #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
2504
2505    // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
2506    // out(1) = c1[even]*x + c1[odd]*y
2507    #define dct_rot(out0,out1, x,y,c0,c1) \
2508       __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
2509       __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
2510       __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
2511       __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
2512       __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
2513       __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
2514
2515    // out = in << 12  (in 16-bit, out 32-bit)
2516    #define dct_widen(out, in) \
2517       __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
2518       __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
2519
2520    // wide add
2521    #define dct_wadd(out, a, b) \
2522       __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
2523       __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
2524
2525    // wide sub
2526    #define dct_wsub(out, a, b) \
2527       __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
2528       __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
2529
2530    // butterfly a/b, add bias, then shift by "s" and pack
2531    #define dct_bfly32o(out0, out1, a,b,bias,s) \
2532       { \
2533          __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
2534          __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
2535          dct_wadd(sum, abiased, b); \
2536          dct_wsub(dif, abiased, b); \
2537          out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
2538          out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
2539       }
2540
2541    // 8-bit interleave step (for transposes)
2542    #define dct_interleave8(a, b) \
2543       tmp = a; \
2544       a = _mm_unpacklo_epi8(a, b); \
2545       b = _mm_unpackhi_epi8(tmp, b)
2546
2547    // 16-bit interleave step (for transposes)
2548    #define dct_interleave16(a, b) \
2549       tmp = a; \
2550       a = _mm_unpacklo_epi16(a, b); \
2551       b = _mm_unpackhi_epi16(tmp, b)
2552
2553    #define dct_pass(bias,shift) \
2554       { \
2555          /* even part */ \
2556          dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
2557          __m128i sum04 = _mm_add_epi16(row0, row4); \
2558          __m128i dif04 = _mm_sub_epi16(row0, row4); \
2559          dct_widen(t0e, sum04); \
2560          dct_widen(t1e, dif04); \
2561          dct_wadd(x0, t0e, t3e); \
2562          dct_wsub(x3, t0e, t3e); \
2563          dct_wadd(x1, t1e, t2e); \
2564          dct_wsub(x2, t1e, t2e); \
2565          /* odd part */ \
2566          dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
2567          dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
2568          __m128i sum17 = _mm_add_epi16(row1, row7); \
2569          __m128i sum35 = _mm_add_epi16(row3, row5); \
2570          dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
2571          dct_wadd(x4, y0o, y4o); \
2572          dct_wadd(x5, y1o, y5o); \
2573          dct_wadd(x6, y2o, y5o); \
2574          dct_wadd(x7, y3o, y4o); \
2575          dct_bfly32o(row0,row7, x0,x7,bias,shift); \
2576          dct_bfly32o(row1,row6, x1,x6,bias,shift); \
2577          dct_bfly32o(row2,row5, x2,x5,bias,shift); \
2578          dct_bfly32o(row3,row4, x3,x4,bias,shift); \
2579       }
2580
2581    __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
2582    __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
2583    __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
2584    __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
2585    __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
2586    __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
2587    __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
2588    __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
2589
2590    // rounding biases in column/row passes, see stbi__idct_block for explanation.
2591    __m128i bias_0 = _mm_set1_epi32(512);
2592    __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
2593
2594    // load
2595    row0 = _mm_load_si128((const __m128i *) (data + 0*8));
2596    row1 = _mm_load_si128((const __m128i *) (data + 1*8));
2597    row2 = _mm_load_si128((const __m128i *) (data + 2*8));
2598    row3 = _mm_load_si128((const __m128i *) (data + 3*8));
2599    row4 = _mm_load_si128((const __m128i *) (data + 4*8));
2600    row5 = _mm_load_si128((const __m128i *) (data + 5*8));
2601    row6 = _mm_load_si128((const __m128i *) (data + 6*8));
2602    row7 = _mm_load_si128((const __m128i *) (data + 7*8));
2603
2604    // column pass
2605    dct_pass(bias_0, 10);
2606
2607    {
2608       // 16bit 8x8 transpose pass 1
2609       dct_interleave16(row0, row4);
2610       dct_interleave16(row1, row5);
2611       dct_interleave16(row2, row6);
2612       dct_interleave16(row3, row7);
2613
2614       // transpose pass 2
2615       dct_interleave16(row0, row2);
2616       dct_interleave16(row1, row3);
2617       dct_interleave16(row4, row6);
2618       dct_interleave16(row5, row7);
2619
2620       // transpose pass 3
2621       dct_interleave16(row0, row1);
2622       dct_interleave16(row2, row3);
2623       dct_interleave16(row4, row5);
2624       dct_interleave16(row6, row7);
2625    }
2626
2627    // row pass
2628    dct_pass(bias_1, 17);
2629
2630    {
2631       // pack
2632       __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
2633       __m128i p1 = _mm_packus_epi16(row2, row3);
2634       __m128i p2 = _mm_packus_epi16(row4, row5);
2635       __m128i p3 = _mm_packus_epi16(row6, row7);
2636
2637       // 8bit 8x8 transpose pass 1
2638       dct_interleave8(p0, p2); // a0e0a1e1...
2639       dct_interleave8(p1, p3); // c0g0c1g1...
2640
2641       // transpose pass 2
2642       dct_interleave8(p0, p1); // a0c0e0g0...
2643       dct_interleave8(p2, p3); // b0d0f0h0...
2644
2645       // transpose pass 3
2646       dct_interleave8(p0, p2); // a0b0c0d0...
2647       dct_interleave8(p1, p3); // a4b4c4d4...
2648
2649       // store
2650       _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
2651       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
2652       _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
2653       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
2654       _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
2655       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
2656       _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
2657       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
2658    }
2659
2660 #undef dct_const
2661 #undef dct_rot
2662 #undef dct_widen
2663 #undef dct_wadd
2664 #undef dct_wsub
2665 #undef dct_bfly32o
2666 #undef dct_interleave8
2667 #undef dct_interleave16
2668 #undef dct_pass
2669 }
2670
2671 #endif // STBI_SSE2
2672
2673 #ifdef STBI_NEON
2674
2675 // NEON integer IDCT. should produce bit-identical
2676 // results to the generic C version.
2677 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2678 {
2679    int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
2680
2681    int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
2682    int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
2683    int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
2684    int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
2685    int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
2686    int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
2687    int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
2688    int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
2689    int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
2690    int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
2691    int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
2692    int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
2693
2694 #define dct_long_mul(out, inq, coeff) \
2695    int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
2696    int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
2697
2698 #define dct_long_mac(out, acc, inq, coeff) \
2699    int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
2700    int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
2701
2702 #define dct_widen(out, inq) \
2703    int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
2704    int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
2705
2706 // wide add
2707 #define dct_wadd(out, a, b) \
2708    int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
2709    int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
2710
2711 // wide sub
2712 #define dct_wsub(out, a, b) \
2713    int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
2714    int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
2715
2716 // butterfly a/b, then shift using "shiftop" by "s" and pack
2717 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
2718    { \
2719       dct_wadd(sum, a, b); \
2720       dct_wsub(dif, a, b); \
2721       out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
2722       out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
2723    }
2724
2725 #define dct_pass(shiftop, shift) \
2726    { \
2727       /* even part */ \
2728       int16x8_t sum26 = vaddq_s16(row2, row6); \
2729       dct_long_mul(p1e, sum26, rot0_0); \
2730       dct_long_mac(t2e, p1e, row6, rot0_1); \
2731       dct_long_mac(t3e, p1e, row2, rot0_2); \
2732       int16x8_t sum04 = vaddq_s16(row0, row4); \
2733       int16x8_t dif04 = vsubq_s16(row0, row4); \
2734       dct_widen(t0e, sum04); \
2735       dct_widen(t1e, dif04); \
2736       dct_wadd(x0, t0e, t3e); \
2737       dct_wsub(x3, t0e, t3e); \
2738       dct_wadd(x1, t1e, t2e); \
2739       dct_wsub(x2, t1e, t2e); \
2740       /* odd part */ \
2741       int16x8_t sum15 = vaddq_s16(row1, row5); \
2742       int16x8_t sum17 = vaddq_s16(row1, row7); \
2743       int16x8_t sum35 = vaddq_s16(row3, row5); \
2744       int16x8_t sum37 = vaddq_s16(row3, row7); \
2745       int16x8_t sumodd = vaddq_s16(sum17, sum35); \
2746       dct_long_mul(p5o, sumodd, rot1_0); \
2747       dct_long_mac(p1o, p5o, sum17, rot1_1); \
2748       dct_long_mac(p2o, p5o, sum35, rot1_2); \
2749       dct_long_mul(p3o, sum37, rot2_0); \
2750       dct_long_mul(p4o, sum15, rot2_1); \
2751       dct_wadd(sump13o, p1o, p3o); \
2752       dct_wadd(sump24o, p2o, p4o); \
2753       dct_wadd(sump23o, p2o, p3o); \
2754       dct_wadd(sump14o, p1o, p4o); \
2755       dct_long_mac(x4, sump13o, row7, rot3_0); \
2756       dct_long_mac(x5, sump24o, row5, rot3_1); \
2757       dct_long_mac(x6, sump23o, row3, rot3_2); \
2758       dct_long_mac(x7, sump14o, row1, rot3_3); \
2759       dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
2760       dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
2761       dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
2762       dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
2763    }
2764
2765    // load
2766    row0 = vld1q_s16(data + 0*8);
2767    row1 = vld1q_s16(data + 1*8);
2768    row2 = vld1q_s16(data + 2*8);
2769    row3 = vld1q_s16(data + 3*8);
2770    row4 = vld1q_s16(data + 4*8);
2771    row5 = vld1q_s16(data + 5*8);
2772    row6 = vld1q_s16(data + 6*8);
2773    row7 = vld1q_s16(data + 7*8);
2774
2775    // add DC bias
2776    row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
2777
2778    // column pass
2779    dct_pass(vrshrn_n_s32, 10);
2780
2781    // 16bit 8x8 transpose
2782    {
2783 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
2784 // whether compilers actually get this is another story, sadly.
2785 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
2786 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
2787 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
2788
2789       // pass 1
2790       dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
2791       dct_trn16(row2, row3);
2792       dct_trn16(row4, row5);
2793       dct_trn16(row6, row7);
2794
2795       // pass 2
2796       dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
2797       dct_trn32(row1, row3);
2798       dct_trn32(row4, row6);
2799       dct_trn32(row5, row7);
2800
2801       // pass 3
2802       dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
2803       dct_trn64(row1, row5);
2804       dct_trn64(row2, row6);
2805       dct_trn64(row3, row7);
2806
2807 #undef dct_trn16
2808 #undef dct_trn32
2809 #undef dct_trn64
2810    }
2811
2812    // row pass
2813    // vrshrn_n_s32 only supports shifts up to 16, we need
2814    // 17. so do a non-rounding shift of 16 first then follow
2815    // up with a rounding shift by 1.
2816    dct_pass(vshrn_n_s32, 16);
2817
2818    {
2819       // pack and round
2820       uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
2821       uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
2822       uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
2823       uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
2824       uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
2825       uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
2826       uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
2827       uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
2828
2829       // again, these can translate into one instruction, but often don't.
2830 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
2831 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
2832 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
2833
2834       // sadly can't use interleaved stores here since we only write
2835       // 8 bytes to each scan line!
2836
2837       // 8x8 8-bit transpose pass 1
2838       dct_trn8_8(p0, p1);
2839       dct_trn8_8(p2, p3);
2840       dct_trn8_8(p4, p5);
2841       dct_trn8_8(p6, p7);
2842
2843       // pass 2
2844       dct_trn8_16(p0, p2);
2845       dct_trn8_16(p1, p3);
2846       dct_trn8_16(p4, p6);
2847       dct_trn8_16(p5, p7);
2848
2849       // pass 3
2850       dct_trn8_32(p0, p4);
2851       dct_trn8_32(p1, p5);
2852       dct_trn8_32(p2, p6);
2853       dct_trn8_32(p3, p7);
2854
2855       // store
2856       vst1_u8(out, p0); out += out_stride;
2857       vst1_u8(out, p1); out += out_stride;
2858       vst1_u8(out, p2); out += out_stride;
2859       vst1_u8(out, p3); out += out_stride;
2860       vst1_u8(out, p4); out += out_stride;
2861       vst1_u8(out, p5); out += out_stride;
2862       vst1_u8(out, p6); out += out_stride;
2863       vst1_u8(out, p7);
2864
2865 #undef dct_trn8_8
2866 #undef dct_trn8_16
2867 #undef dct_trn8_32
2868    }
2869
2870 #undef dct_long_mul
2871 #undef dct_long_mac
2872 #undef dct_widen
2873 #undef dct_wadd
2874 #undef dct_wsub
2875 #undef dct_bfly32o
2876 #undef dct_pass
2877 }
2878
2879 #endif // STBI_NEON
2880
2881 #define STBI__MARKER_none  0xff
2882 // if there's a pending marker from the entropy stream, return that
2883 // otherwise, fetch from the stream and get a marker. if there's no
2884 // marker, return 0xff, which is never a valid marker value
2885 static stbi_uc stbi__get_marker(stbi__jpeg *j)
2886 {
2887    stbi_uc x;
2888    if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
2889    x = stbi__get8(j->s);
2890    if (x != 0xff) return STBI__MARKER_none;
2891    while (x == 0xff)
2892       x = stbi__get8(j->s); // consume repeated 0xff fill bytes
2893    return x;
2894 }
2895
2896 // in each scan, we'll have scan_n components, and the order
2897 // of the components is specified by order[]
2898 #define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
2899
2900 // after a restart interval, stbi__jpeg_reset the entropy decoder and
2901 // the dc prediction
2902 static void stbi__jpeg_reset(stbi__jpeg *j)
2903 {
2904    j->code_bits = 0;
2905    j->code_buffer = 0;
2906    j->nomore = 0;
2907    j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = j->img_comp[3].dc_pred = 0;
2908    j->marker = STBI__MARKER_none;
2909    j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
2910    j->eob_run = 0;
2911    // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
2912    // since we don't even allow 1<<30 pixels
2913 }
2914
2915 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
2916 {
2917    stbi__jpeg_reset(z);
2918    if (!z->progressive) {
2919       if (z->scan_n == 1) {
2920          int i,j;
2921          STBI_SIMD_ALIGN(short, data[64]);
2922          int n = z->order[0];
2923          // non-interleaved data, we just need to process one block at a time,
2924          // in trivial scanline order
2925          // number of blocks to do just depends on how many actual "pixels" this
2926          // component has, independent of interleaved MCU blocking and such
2927          int w = (z->img_comp[n].x+7) >> 3;
2928          int h = (z->img_comp[n].y+7) >> 3;
2929          for (j=0; j < h; ++j) {
2930             for (i=0; i < w; ++i) {
2931                int ha = z->img_comp[n].ha;
2932                if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2933                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2934                // every data block is an MCU, so countdown the restart interval
2935                if (--z->todo <= 0) {
2936                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2937                   // if it's NOT a restart, then just bail, so we get corrupt data
2938                   // rather than no data
2939                   if (!STBI__RESTART(z->marker)) return 1;
2940                   stbi__jpeg_reset(z);
2941                }
2942             }
2943          }
2944          return 1;
2945       } else { // interleaved
2946          int i,j,k,x,y;
2947          STBI_SIMD_ALIGN(short, data[64]);
2948          for (j=0; j < z->img_mcu_y; ++j) {
2949             for (i=0; i < z->img_mcu_x; ++i) {
2950                // scan an interleaved mcu... process scan_n components in order
2951                for (k=0; k < z->scan_n; ++k) {
2952                   int n = z->order[k];
2953                   // scan out an mcu's worth of this component; that's just determined
2954                   // by the basic H and V specified for the component
2955                   for (y=0; y < z->img_comp[n].v; ++y) {
2956                      for (x=0; x < z->img_comp[n].h; ++x) {
2957                         int x2 = (i*z->img_comp[n].h + x)*8;
2958                         int y2 = (j*z->img_comp[n].v + y)*8;
2959                         int ha = z->img_comp[n].ha;
2960                         if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2961                         z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
2962                      }
2963                   }
2964                }
2965                // after all interleaved components, that's an interleaved MCU,
2966                // so now count down the restart interval
2967                if (--z->todo <= 0) {
2968                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2969                   if (!STBI__RESTART(z->marker)) return 1;
2970                   stbi__jpeg_reset(z);
2971                }
2972             }
2973          }
2974          return 1;
2975       }
2976    } else {
2977       if (z->scan_n == 1) {
2978          int i,j;
2979          int n = z->order[0];
2980          // non-interleaved data, we just need to process one block at a time,
2981          // in trivial scanline order
2982          // number of blocks to do just depends on how many actual "pixels" this
2983          // component has, independent of interleaved MCU blocking and such
2984          int w = (z->img_comp[n].x+7) >> 3;
2985          int h = (z->img_comp[n].y+7) >> 3;
2986          for (j=0; j < h; ++j) {
2987             for (i=0; i < w; ++i) {
2988                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2989                if (z->spec_start == 0) {
2990                   if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2991                      return 0;
2992                } else {
2993                   int ha = z->img_comp[n].ha;
2994                   if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
2995                      return 0;
2996                }
2997                // every data block is an MCU, so countdown the restart interval
2998                if (--z->todo <= 0) {
2999                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
3000                   if (!STBI__RESTART(z->marker)) return 1;
3001                   stbi__jpeg_reset(z);
3002                }
3003             }
3004          }
3005          return 1;
3006       } else { // interleaved
3007          int i,j,k,x,y;
3008          for (j=0; j < z->img_mcu_y; ++j) {
3009             for (i=0; i < z->img_mcu_x; ++i) {
3010                // scan an interleaved mcu... process scan_n components in order
3011                for (k=0; k < z->scan_n; ++k) {
3012                   int n = z->order[k];
3013                   // scan out an mcu's worth of this component; that's just determined
3014                   // by the basic H and V specified for the component
3015                   for (y=0; y < z->img_comp[n].v; ++y) {
3016                      for (x=0; x < z->img_comp[n].h; ++x) {
3017                         int x2 = (i*z->img_comp[n].h + x);
3018                         int y2 = (j*z->img_comp[n].v + y);
3019                         short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
3020                         if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
3021                            return 0;
3022                      }
3023                   }
3024                }
3025                // after all interleaved components, that's an interleaved MCU,
3026                // so now count down the restart interval
3027                if (--z->todo <= 0) {
3028                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
3029                   if (!STBI__RESTART(z->marker)) return 1;
3030                   stbi__jpeg_reset(z);
3031                }
3032             }
3033          }
3034          return 1;
3035       }
3036    }
3037 }
3038
3039 static void stbi__jpeg_dequantize(short *data, stbi__uint16 *dequant)
3040 {
3041    int i;
3042    for (i=0; i < 64; ++i)
3043       data[i] *= dequant[i];
3044 }
3045
3046 static void stbi__jpeg_finish(stbi__jpeg *z)
3047 {
3048    if (z->progressive) {
3049       // dequantize and idct the data
3050       int i,j,n;
3051       for (n=0; n < z->s->img_n; ++n) {
3052          int w = (z->img_comp[n].x+7) >> 3;
3053          int h = (z->img_comp[n].y+7) >> 3;
3054          for (j=0; j < h; ++j) {
3055             for (i=0; i < w; ++i) {
3056                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
3057                stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
3058                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
3059             }
3060          }
3061       }
3062    }
3063 }
3064
3065 static int stbi__process_marker(stbi__jpeg *z, int m)
3066 {
3067    int L;
3068    switch (m) {
3069       case STBI__MARKER_none: // no marker found
3070          return stbi__err("expected marker","Corrupt JPEG");
3071
3072       case 0xDD: // DRI - specify restart interval
3073          if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
3074          z->restart_interval = stbi__get16be(z->s);
3075          return 1;
3076
3077       case 0xDB: // DQT - define quantization table
3078          L = stbi__get16be(z->s)-2;
3079          while (L > 0) {
3080             int q = stbi__get8(z->s);
3081             int p = q >> 4, sixteen = (p != 0);
3082             int t = q & 15,i;
3083             if (p != 0 && p != 1) return stbi__err("bad DQT type","Corrupt JPEG");
3084             if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
3085
3086             for (i=0; i < 64; ++i)
3087                z->dequant[t][stbi__jpeg_dezigzag[i]] = (stbi__uint16)(sixteen ? stbi__get16be(z->s) : stbi__get8(z->s));
3088             L -= (sixteen ? 129 : 65);
3089          }
3090          return L==0;
3091
3092       case 0xC4: // DHT - define huffman table
3093          L = stbi__get16be(z->s)-2;
3094          while (L > 0) {
3095             stbi_uc *v;
3096             int sizes[16],i,n=0;
3097             int q = stbi__get8(z->s);
3098             int tc = q >> 4;
3099             int th = q & 15;
3100             if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
3101             for (i=0; i < 16; ++i) {
3102                sizes[i] = stbi__get8(z->s);
3103                n += sizes[i];
3104             }
3105             L -= 17;
3106             if (tc == 0) {
3107                if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
3108                v = z->huff_dc[th].values;
3109             } else {
3110                if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
3111                v = z->huff_ac[th].values;
3112             }
3113             for (i=0; i < n; ++i)
3114                v[i] = stbi__get8(z->s);
3115             if (tc != 0)
3116                stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
3117             L -= n;
3118          }
3119          return L==0;
3120    }
3121
3122    // check for comment block or APP blocks
3123    if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
3124       L = stbi__get16be(z->s);
3125       if (L < 2) {
3126          if (m == 0xFE)
3127             return stbi__err("bad COM len","Corrupt JPEG");
3128          else
3129             return stbi__err("bad APP len","Corrupt JPEG");
3130       }
3131       L -= 2;
3132
3133       if (m == 0xE0 && L >= 5) { // JFIF APP0 segment
3134          static const unsigned char tag[5] = {'J','F','I','F','\0'};
3135          int ok = 1;
3136          int i;
3137          for (i=0; i < 5; ++i)
3138             if (stbi__get8(z->s) != tag[i])
3139                ok = 0;
3140          L -= 5;
3141          if (ok)
3142             z->jfif = 1;
3143       } else if (m == 0xEE && L >= 12) { // Adobe APP14 segment
3144          static const unsigned char tag[6] = {'A','d','o','b','e','\0'};
3145          int ok = 1;
3146          int i;
3147          for (i=0; i < 6; ++i)
3148             if (stbi__get8(z->s) != tag[i])
3149                ok = 0;
3150          L -= 6;
3151          if (ok) {
3152             stbi__get8(z->s); // version
3153             stbi__get16be(z->s); // flags0
3154             stbi__get16be(z->s); // flags1
3155             z->app14_color_transform = stbi__get8(z->s); // color transform
3156             L -= 6;
3157          }
3158       }
3159
3160       stbi__skip(z->s, L);
3161       return 1;
3162    }
3163
3164    return stbi__err("unknown marker","Corrupt JPEG");
3165 }
3166
3167 // after we see SOS
3168 static int stbi__process_scan_header(stbi__jpeg *z)
3169 {
3170    int i;
3171    int Ls = stbi__get16be(z->s);
3172    z->scan_n = stbi__get8(z->s);
3173    if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
3174    if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
3175    for (i=0; i < z->scan_n; ++i) {
3176       int id = stbi__get8(z->s), which;
3177       int q = stbi__get8(z->s);
3178       for (which = 0; which < z->s->img_n; ++which)
3179          if (z->img_comp[which].id == id)
3180             break;
3181       if (which == z->s->img_n) return 0; // no match
3182       z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
3183       z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
3184       z->order[i] = which;
3185    }
3186
3187    {
3188       int aa;
3189       z->spec_start = stbi__get8(z->s);
3190       z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
3191       aa = stbi__get8(z->s);
3192       z->succ_high = (aa >> 4);
3193       z->succ_low  = (aa & 15);
3194       if (z->progressive) {
3195          if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
3196             return stbi__err("bad SOS", "Corrupt JPEG");
3197       } else {
3198          if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
3199          if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
3200          z->spec_end = 63;
3201       }
3202    }
3203
3204    return 1;
3205 }
3206
3207 static int stbi__free_jpeg_components(stbi__jpeg *z, int ncomp, int why)
3208 {
3209    int i;
3210    for (i=0; i < ncomp; ++i) {
3211       if (z->img_comp[i].raw_data) {
3212          STBI_FREE(z->img_comp[i].raw_data);
3213          z->img_comp[i].raw_data = NULL;
3214          z->img_comp[i].data = NULL;
3215       }
3216       if (z->img_comp[i].raw_coeff) {
3217          STBI_FREE(z->img_comp[i].raw_coeff);
3218          z->img_comp[i].raw_coeff = 0;
3219          z->img_comp[i].coeff = 0;
3220       }
3221       if (z->img_comp[i].linebuf) {
3222          STBI_FREE(z->img_comp[i].linebuf);
3223          z->img_comp[i].linebuf = NULL;
3224       }
3225    }
3226    return why;
3227 }
3228
3229 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
3230 {
3231    stbi__context *s = z->s;
3232    int Lf,p,i,q, h_max=1,v_max=1,c;
3233    Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
3234    p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
3235    s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
3236    s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
3237    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
3238    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
3239    c = stbi__get8(s);
3240    if (c != 3 && c != 1 && c != 4) return stbi__err("bad component count","Corrupt JPEG");
3241    s->img_n = c;
3242    for (i=0; i < c; ++i) {
3243       z->img_comp[i].data = NULL;
3244       z->img_comp[i].linebuf = NULL;
3245    }
3246
3247    if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
3248
3249    z->rgb = 0;
3250    for (i=0; i < s->img_n; ++i) {
3251       static const unsigned char rgb[3] = { 'R', 'G', 'B' };
3252       z->img_comp[i].id = stbi__get8(s);
3253       if (s->img_n == 3 && z->img_comp[i].id == rgb[i])
3254          ++z->rgb;
3255       q = stbi__get8(s);
3256       z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
3257       z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
3258       z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
3259    }
3260
3261    if (scan != STBI__SCAN_load) return 1;
3262
3263    if (!stbi__mad3sizes_valid(s->img_x, s->img_y, s->img_n, 0)) return stbi__err("too large", "Image too large to decode");
3264
3265    for (i=0; i < s->img_n; ++i) {
3266       if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
3267       if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
3268    }
3269
3270    // compute interleaved mcu info
3271    z->img_h_max = h_max;
3272    z->img_v_max = v_max;
3273    z->img_mcu_w = h_max * 8;
3274    z->img_mcu_h = v_max * 8;
3275    // these sizes can't be more than 17 bits
3276    z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
3277    z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
3278
3279    for (i=0; i < s->img_n; ++i) {
3280       // number of effective pixels (e.g. for non-interleaved MCU)
3281       z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
3282       z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
3283       // to simplify generation, we'll allocate enough memory to decode
3284       // the bogus oversized data from using interleaved MCUs and their
3285       // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
3286       // discard the extra data until colorspace conversion
3287       //
3288       // img_mcu_x, img_mcu_y: <=17 bits; comp[i].h and .v are <=4 (checked earlier)
3289       // so these muls can't overflow with 32-bit ints (which we require)
3290       z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
3291       z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
3292       z->img_comp[i].coeff = 0;
3293       z->img_comp[i].raw_coeff = 0;
3294       z->img_comp[i].linebuf = NULL;
3295       z->img_comp[i].raw_data = stbi__malloc_mad2(z->img_comp[i].w2, z->img_comp[i].h2, 15);
3296       if (z->img_comp[i].raw_data == NULL)
3297          return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
3298       // align blocks for idct using mmx/sse
3299       z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
3300       if (z->progressive) {
3301          // w2, h2 are multiples of 8 (see above)
3302          z->img_comp[i].coeff_w = z->img_comp[i].w2 / 8;
3303          z->img_comp[i].coeff_h = z->img_comp[i].h2 / 8;
3304          z->img_comp[i].raw_coeff = stbi__malloc_mad3(z->img_comp[i].w2, z->img_comp[i].h2, sizeof(short), 15);
3305          if (z->img_comp[i].raw_coeff == NULL)
3306             return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
3307          z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
3308       }
3309    }
3310
3311    return 1;
3312 }
3313
3314 // use comparisons since in some cases we handle more than one case (e.g. SOF)
3315 #define stbi__DNL(x)         ((x) == 0xdc)
3316 #define stbi__SOI(x)         ((x) == 0xd8)
3317 #define stbi__EOI(x)         ((x) == 0xd9)
3318 #define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
3319 #define stbi__SOS(x)         ((x) == 0xda)
3320
3321 #define stbi__SOF_progressive(x)   ((x) == 0xc2)
3322
3323 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
3324 {
3325    int m;
3326    z->jfif = 0;
3327    z->app14_color_transform = -1; // valid values are 0,1,2
3328    z->marker = STBI__MARKER_none; // initialize cached marker to empty
3329    m = stbi__get_marker(z);
3330    if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
3331    if (scan == STBI__SCAN_type) return 1;
3332    m = stbi__get_marker(z);
3333    while (!stbi__SOF(m)) {
3334       if (!stbi__process_marker(z,m)) return 0;
3335       m = stbi__get_marker(z);
3336       while (m == STBI__MARKER_none) {
3337          // some files have extra padding after their blocks, so ok, we'll scan
3338          if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
3339          m = stbi__get_marker(z);
3340       }
3341    }
3342    z->progressive = stbi__SOF_progressive(m);
3343    if (!stbi__process_frame_header(z, scan)) return 0;
3344    return 1;
3345 }
3346
3347 // decode image to YCbCr format
3348 static int stbi__decode_jpeg_image(stbi__jpeg *j)
3349 {
3350    int m;
3351    for (m = 0; m < 4; m++) {
3352       j->img_comp[m].raw_data = NULL;
3353       j->img_comp[m].raw_coeff = NULL;
3354    }
3355    j->restart_interval = 0;
3356    if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
3357    m = stbi__get_marker(j);
3358    while (!stbi__EOI(m)) {
3359       if (stbi__SOS(m)) {
3360          if (!stbi__process_scan_header(j)) return 0;
3361          if (!stbi__parse_entropy_coded_data(j)) return 0;
3362          if (j->marker == STBI__MARKER_none ) {
3363             // handle 0s at the end of image data from IP Kamera 9060
3364             while (!stbi__at_eof(j->s)) {
3365                int x = stbi__get8(j->s);
3366                if (x == 255) {
3367                   j->marker = stbi__get8(j->s);
3368                   break;
3369                }
3370             }
3371             // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
3372          }
3373       } else if (stbi__DNL(m)) {
3374          int Ld = stbi__get16be(j->s);
3375          stbi__uint32 NL = stbi__get16be(j->s);
3376          if (Ld != 4) return stbi__err("bad DNL len", "Corrupt JPEG");
3377          if (NL != j->s->img_y) return stbi__err("bad DNL height", "Corrupt JPEG");
3378       } else {
3379          if (!stbi__process_marker(j, m)) return 0;
3380       }
3381       m = stbi__get_marker(j);
3382    }
3383    if (j->progressive)
3384       stbi__jpeg_finish(j);
3385    return 1;
3386 }
3387
3388 // static jfif-centered resampling (across block boundaries)
3389
3390 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
3391                                     int w, int hs);
3392
3393 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
3394
3395 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3396 {
3397    STBI_NOTUSED(out);
3398    STBI_NOTUSED(in_far);
3399    STBI_NOTUSED(w);
3400    STBI_NOTUSED(hs);
3401    return in_near;
3402 }
3403
3404 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3405 {
3406    // need to generate two samples vertically for every one in input
3407    int i;
3408    STBI_NOTUSED(hs);
3409    for (i=0; i < w; ++i)
3410       out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
3411    return out;
3412 }
3413
3414 static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3415 {
3416    // need to generate two samples horizontally for every one in input
3417    int i;
3418    stbi_uc *input = in_near;
3419
3420    if (w == 1) {
3421       // if only one sample, can't do any interpolation
3422       out[0] = out[1] = input[0];
3423       return out;
3424    }
3425
3426    out[0] = input[0];
3427    out[1] = stbi__div4(input[0]*3 + input[1] + 2);
3428    for (i=1; i < w-1; ++i) {
3429       int n = 3*input[i]+2;
3430       out[i*2+0] = stbi__div4(n+input[i-1]);
3431       out[i*2+1] = stbi__div4(n+input[i+1]);
3432    }
3433    out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
3434    out[i*2+1] = input[w-1];
3435
3436    STBI_NOTUSED(in_far);
3437    STBI_NOTUSED(hs);
3438
3439    return out;
3440 }
3441
3442 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
3443
3444 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3445 {
3446    // need to generate 2x2 samples for every one in input
3447    int i,t0,t1;
3448    if (w == 1) {
3449       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
3450       return out;
3451    }
3452
3453    t1 = 3*in_near[0] + in_far[0];
3454    out[0] = stbi__div4(t1+2);
3455    for (i=1; i < w; ++i) {
3456       t0 = t1;
3457       t1 = 3*in_near[i]+in_far[i];
3458       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3459       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
3460    }
3461    out[w*2-1] = stbi__div4(t1+2);
3462
3463    STBI_NOTUSED(hs);
3464
3465    return out;
3466 }
3467
3468 #if defined(STBI_SSE2) || defined(STBI_NEON)
3469 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3470 {
3471    // need to generate 2x2 samples for every one in input
3472    int i=0,t0,t1;
3473
3474    if (w == 1) {
3475       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
3476       return out;
3477    }
3478
3479    t1 = 3*in_near[0] + in_far[0];
3480    // process groups of 8 pixels for as long as we can.
3481    // note we can't handle the last pixel in a row in this loop
3482    // because we need to handle the filter boundary conditions.
3483    for (; i < ((w-1) & ~7); i += 8) {
3484 #if defined(STBI_SSE2)
3485       // load and perform the vertical filtering pass
3486       // this uses 3*x + y = 4*x + (y - x)
3487       __m128i zero  = _mm_setzero_si128();
3488       __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
3489       __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
3490       __m128i farw  = _mm_unpacklo_epi8(farb, zero);
3491       __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
3492       __m128i diff  = _mm_sub_epi16(farw, nearw);
3493       __m128i nears = _mm_slli_epi16(nearw, 2);
3494       __m128i curr  = _mm_add_epi16(nears, diff); // current row
3495
3496       // horizontal filter works the same based on shifted vers of current
3497       // row. "prev" is current row shifted right by 1 pixel; we need to
3498       // insert the previous pixel value (from t1).
3499       // "next" is current row shifted left by 1 pixel, with first pixel
3500       // of next block of 8 pixels added in.
3501       __m128i prv0 = _mm_slli_si128(curr, 2);
3502       __m128i nxt0 = _mm_srli_si128(curr, 2);
3503       __m128i prev = _mm_insert_epi16(prv0, t1, 0);
3504       __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
3505
3506       // horizontal filter, polyphase implementation since it's convenient:
3507       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3508       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
3509       // note the shared term.
3510       __m128i bias  = _mm_set1_epi16(8);
3511       __m128i curs = _mm_slli_epi16(curr, 2);
3512       __m128i prvd = _mm_sub_epi16(prev, curr);
3513       __m128i nxtd = _mm_sub_epi16(next, curr);
3514       __m128i curb = _mm_add_epi16(curs, bias);
3515       __m128i even = _mm_add_epi16(prvd, curb);
3516       __m128i odd  = _mm_add_epi16(nxtd, curb);
3517
3518       // interleave even and odd pixels, then undo scaling.
3519       __m128i int0 = _mm_unpacklo_epi16(even, odd);
3520       __m128i int1 = _mm_unpackhi_epi16(even, odd);
3521       __m128i de0  = _mm_srli_epi16(int0, 4);
3522       __m128i de1  = _mm_srli_epi16(int1, 4);
3523
3524       // pack and write output
3525       __m128i outv = _mm_packus_epi16(de0, de1);
3526       _mm_storeu_si128((__m128i *) (out + i*2), outv);
3527 #elif defined(STBI_NEON)
3528       // load and perform the vertical filtering pass
3529       // this uses 3*x + y = 4*x + (y - x)
3530       uint8x8_t farb  = vld1_u8(in_far + i);
3531       uint8x8_t nearb = vld1_u8(in_near + i);
3532       int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
3533       int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
3534       int16x8_t curr  = vaddq_s16(nears, diff); // current row
3535
3536       // horizontal filter works the same based on shifted vers of current
3537       // row. "prev" is current row shifted right by 1 pixel; we need to
3538       // insert the previous pixel value (from t1).
3539       // "next" is current row shifted left by 1 pixel, with first pixel
3540       // of next block of 8 pixels added in.
3541       int16x8_t prv0 = vextq_s16(curr, curr, 7);
3542       int16x8_t nxt0 = vextq_s16(curr, curr, 1);
3543       int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
3544       int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
3545
3546       // horizontal filter, polyphase implementation since it's convenient:
3547       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3548       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
3549       // note the shared term.
3550       int16x8_t curs = vshlq_n_s16(curr, 2);
3551       int16x8_t prvd = vsubq_s16(prev, curr);
3552       int16x8_t nxtd = vsubq_s16(next, curr);
3553       int16x8_t even = vaddq_s16(curs, prvd);
3554       int16x8_t odd  = vaddq_s16(curs, nxtd);
3555
3556       // undo scaling and round, then store with even/odd phases interleaved
3557       uint8x8x2_t o;
3558       o.val[0] = vqrshrun_n_s16(even, 4);
3559       o.val[1] = vqrshrun_n_s16(odd,  4);
3560       vst2_u8(out + i*2, o);
3561 #endif
3562
3563       // "previous" value for next iter
3564       t1 = 3*in_near[i+7] + in_far[i+7];
3565    }
3566
3567    t0 = t1;
3568    t1 = 3*in_near[i] + in_far[i];
3569    out[i*2] = stbi__div16(3*t1 + t0 + 8);
3570
3571    for (++i; i < w; ++i) {
3572       t0 = t1;
3573       t1 = 3*in_near[i]+in_far[i];
3574       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3575       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
3576    }
3577    out[w*2-1] = stbi__div4(t1+2);
3578
3579    STBI_NOTUSED(hs);
3580
3581    return out;
3582 }
3583 #endif
3584
3585 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3586 {
3587    // resample with nearest-neighbor
3588    int i,j;
3589    STBI_NOTUSED(in_far);
3590    for (i=0; i < w; ++i)
3591       for (j=0; j < hs; ++j)
3592          out[i*hs+j] = in_near[i];
3593    return out;
3594 }
3595
3596 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
3597 // to make sure the code produces the same results in both SIMD and scalar
3598 #define stbi__float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
3599 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3600 {
3601    int i;
3602    for (i=0; i < count; ++i) {
3603       int y_fixed = (y[i] << 20) + (1<<19); // rounding
3604       int r,g,b;
3605       int cr = pcr[i] - 128;
3606       int cb = pcb[i] - 128;
3607       r = y_fixed +  cr* stbi__float2fixed(1.40200f);
3608       g = y_fixed + (cr*-stbi__float2fixed(0.71414f)) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
3609       b = y_fixed                                     +   cb* stbi__float2fixed(1.77200f);
3610       r >>= 20;
3611       g >>= 20;
3612       b >>= 20;
3613       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3614       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3615       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3616       out[0] = (stbi_uc)r;
3617       out[1] = (stbi_uc)g;
3618       out[2] = (stbi_uc)b;
3619       out[3] = 255;
3620       out += step;
3621    }
3622 }
3623
3624 #if defined(STBI_SSE2) || defined(STBI_NEON)
3625 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
3626 {
3627    int i = 0;
3628
3629 #ifdef STBI_SSE2
3630    // step == 3 is pretty ugly on the final interleave, and i'm not convinced
3631    // it's useful in practice (you wouldn't use it for textures, for example).
3632    // so just accelerate step == 4 case.
3633    if (step == 4) {
3634       // this is a fairly straightforward implementation and not super-optimized.
3635       __m128i signflip  = _mm_set1_epi8(-0x80);
3636       __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
3637       __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
3638       __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
3639       __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
3640       __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
3641       __m128i xw = _mm_set1_epi16(255); // alpha channel
3642
3643       for (; i+7 < count; i += 8) {
3644          // load
3645          __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
3646          __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
3647          __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
3648          __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
3649          __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
3650
3651          // unpack to short (and left-shift cr, cb by 8)
3652          __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
3653          __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
3654          __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
3655
3656          // color transform
3657          __m128i yws = _mm_srli_epi16(yw, 4);
3658          __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
3659          __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
3660          __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
3661          __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
3662          __m128i rws = _mm_add_epi16(cr0, yws);
3663          __m128i gwt = _mm_add_epi16(cb0, yws);
3664          __m128i bws = _mm_add_epi16(yws, cb1);
3665          __m128i gws = _mm_add_epi16(gwt, cr1);
3666
3667          // descale
3668          __m128i rw = _mm_srai_epi16(rws, 4);
3669          __m128i bw = _mm_srai_epi16(bws, 4);
3670          __m128i gw = _mm_srai_epi16(gws, 4);
3671
3672          // back to byte, set up for transpose
3673          __m128i brb = _mm_packus_epi16(rw, bw);
3674          __m128i gxb = _mm_packus_epi16(gw, xw);
3675
3676          // transpose to interleave channels
3677          __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
3678          __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
3679          __m128i o0 = _mm_unpacklo_epi16(t0, t1);
3680          __m128i o1 = _mm_unpackhi_epi16(t0, t1);
3681
3682          // store
3683          _mm_storeu_si128((__m128i *) (out + 0), o0);
3684          _mm_storeu_si128((__m128i *) (out + 16), o1);
3685          out += 32;
3686       }
3687    }
3688 #endif
3689
3690 #ifdef STBI_NEON
3691    // in this version, step=3 support would be easy to add. but is there demand?
3692    if (step == 4) {
3693       // this is a fairly straightforward implementation and not super-optimized.
3694       uint8x8_t signflip = vdup_n_u8(0x80);
3695       int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
3696       int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
3697       int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
3698       int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
3699
3700       for (; i+7 < count; i += 8) {
3701          // load
3702          uint8x8_t y_bytes  = vld1_u8(y + i);
3703          uint8x8_t cr_bytes = vld1_u8(pcr + i);
3704          uint8x8_t cb_bytes = vld1_u8(pcb + i);
3705          int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
3706          int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
3707
3708          // expand to s16
3709          int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
3710          int16x8_t crw = vshll_n_s8(cr_biased, 7);
3711          int16x8_t cbw = vshll_n_s8(cb_biased, 7);
3712
3713          // color transform
3714          int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
3715          int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
3716          int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
3717          int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
3718          int16x8_t rws = vaddq_s16(yws, cr0);
3719          int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
3720          int16x8_t bws = vaddq_s16(yws, cb1);
3721
3722          // undo scaling, round, convert to byte
3723          uint8x8x4_t o;
3724          o.val[0] = vqrshrun_n_s16(rws, 4);
3725          o.val[1] = vqrshrun_n_s16(gws, 4);
3726          o.val[2] = vqrshrun_n_s16(bws, 4);
3727          o.val[3] = vdup_n_u8(255);
3728
3729          // store, interleaving r/g/b/a
3730          vst4_u8(out, o);
3731          out += 8*4;
3732       }
3733    }
3734 #endif
3735
3736    for (; i < count; ++i) {
3737       int y_fixed = (y[i] << 20) + (1<<19); // rounding
3738       int r,g,b;
3739       int cr = pcr[i] - 128;
3740       int cb = pcb[i] - 128;
3741       r = y_fixed + cr* stbi__float2fixed(1.40200f);
3742       g = y_fixed + cr*-stbi__float2fixed(0.71414f) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
3743       b = y_fixed                                   +   cb* stbi__float2fixed(1.77200f);
3744       r >>= 20;
3745       g >>= 20;
3746       b >>= 20;
3747       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3748       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3749       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3750       out[0] = (stbi_uc)r;
3751       out[1] = (stbi_uc)g;
3752       out[2] = (stbi_uc)b;
3753       out[3] = 255;
3754       out += step;
3755    }
3756 }
3757 #endif
3758
3759 // set up the kernels
3760 static void stbi__setup_jpeg(stbi__jpeg *j)
3761 {
3762    j->idct_block_kernel = stbi__idct_block;
3763    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
3764    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
3765
3766 #ifdef STBI_SSE2
3767    if (stbi__sse2_available()) {
3768       j->idct_block_kernel = stbi__idct_simd;
3769       j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3770       j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3771    }
3772 #endif
3773
3774 #ifdef STBI_NEON
3775    j->idct_block_kernel = stbi__idct_simd;
3776    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3777    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3778 #endif
3779 }
3780
3781 // clean up the temporary component buffers
3782 static void stbi__cleanup_jpeg(stbi__jpeg *j)
3783 {
3784    stbi__free_jpeg_components(j, j->s->img_n, 0);
3785 }
3786
3787 typedef struct
3788 {
3789    resample_row_func resample;
3790    stbi_uc *line0,*line1;
3791    int hs,vs;   // expansion factor in each axis
3792    int w_lores; // horizontal pixels pre-expansion
3793    int ystep;   // how far through vertical expansion we are
3794    int ypos;    // which pre-expansion row we're on
3795 } stbi__resample;
3796
3797 // fast 0..255 * 0..255 => 0..255 rounded multiplication
3798 static stbi_uc stbi__blinn_8x8(stbi_uc x, stbi_uc y)
3799 {
3800    unsigned int t = x*y + 128;
3801    return (stbi_uc) ((t + (t >>8)) >> 8);
3802 }
3803
3804 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
3805 {
3806    int n, decode_n, is_rgb;
3807    z->s->img_n = 0; // make stbi__cleanup_jpeg safe
3808
3809    // validate req_comp
3810    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
3811
3812    // load a jpeg image from whichever source, but leave in YCbCr format
3813    if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
3814
3815    // determine actual number of components to generate
3816    n = req_comp ? req_comp : z->s->img_n >= 3 ? 3 : 1;
3817
3818    is_rgb = z->s->img_n == 3 && (z->rgb == 3 || (z->app14_color_transform == 0 && !z->jfif));
3819
3820    if (z->s->img_n == 3 && n < 3 && !is_rgb)
3821       decode_n = 1;
3822    else
3823       decode_n = z->s->img_n;
3824
3825    // nothing to do if no components requested; check this now to avoid
3826    // accessing uninitialized coutput[0] later
3827    if (decode_n <= 0) { stbi__cleanup_jpeg(z); return NULL; }
3828
3829    // resample and color-convert
3830    {
3831       int k;
3832       unsigned int i,j;
3833       stbi_uc *output;
3834       stbi_uc *coutput[4] = { NULL, NULL, NULL, NULL };
3835
3836       stbi__resample res_comp[4];
3837
3838       for (k=0; k < decode_n; ++k) {
3839          stbi__resample *r = &res_comp[k];
3840
3841          // allocate line buffer big enough for upsampling off the edges
3842          // with upsample factor of 4
3843          z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
3844          if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3845
3846          r->hs      = z->img_h_max / z->img_comp[k].h;
3847          r->vs      = z->img_v_max / z->img_comp[k].v;
3848          r->ystep   = r->vs >> 1;
3849          r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
3850          r->ypos    = 0;
3851          r->line0   = r->line1 = z->img_comp[k].data;
3852
3853          if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
3854          else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
3855          else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
3856          else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
3857          else                               r->resample = stbi__resample_row_generic;
3858       }
3859
3860       // can't error after this so, this is safe
3861       output = (stbi_uc *) stbi__malloc_mad3(n, z->s->img_x, z->s->img_y, 1);
3862       if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3863
3864       // now go ahead and resample
3865       for (j=0; j < z->s->img_y; ++j) {
3866          stbi_uc *out = output + n * z->s->img_x * j;
3867          for (k=0; k < decode_n; ++k) {
3868             stbi__resample *r = &res_comp[k];
3869             int y_bot = r->ystep >= (r->vs >> 1);
3870             coutput[k] = r->resample(z->img_comp[k].linebuf,
3871                                      y_bot ? r->line1 : r->line0,
3872                                      y_bot ? r->line0 : r->line1,
3873                                      r->w_lores, r->hs);
3874             if (++r->ystep >= r->vs) {
3875                r->ystep = 0;
3876                r->line0 = r->line1;
3877                if (++r->ypos < z->img_comp[k].y)
3878                   r->line1 += z->img_comp[k].w2;
3879             }
3880          }
3881          if (n >= 3) {
3882             stbi_uc *y = coutput[0];
3883             if (z->s->img_n == 3) {
3884                if (is_rgb) {
3885                   for (i=0; i < z->s->img_x; ++i) {
3886                      out[0] = y[i];
3887                      out[1] = coutput[1][i];
3888                      out[2] = coutput[2][i];
3889                      out[3] = 255;
3890                      out += n;
3891                   }
3892                } else {
3893                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3894                }
3895             } else if (z->s->img_n == 4) {
3896                if (z->app14_color_transform == 0) { // CMYK
3897                   for (i=0; i < z->s->img_x; ++i) {
3898                      stbi_uc m = coutput[3][i];
3899                      out[0] = stbi__blinn_8x8(coutput[0][i], m);
3900                      out[1] = stbi__blinn_8x8(coutput[1][i], m);
3901                      out[2] = stbi__blinn_8x8(coutput[2][i], m);
3902                      out[3] = 255;
3903                      out += n;
3904                   }
3905                } else if (z->app14_color_transform == 2) { // YCCK
3906                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3907                   for (i=0; i < z->s->img_x; ++i) {
3908                      stbi_uc m = coutput[3][i];
3909                      out[0] = stbi__blinn_8x8(255 - out[0], m);
3910                      out[1] = stbi__blinn_8x8(255 - out[1], m);
3911                      out[2] = stbi__blinn_8x8(255 - out[2], m);
3912                      out += n;
3913                   }
3914                } else { // YCbCr + alpha?  Ignore the fourth channel for now
3915                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3916                }
3917             } else
3918                for (i=0; i < z->s->img_x; ++i) {
3919                   out[0] = out[1] = out[2] = y[i];
3920                   out[3] = 255; // not used if n==3
3921                   out += n;
3922                }
3923          } else {
3924             if (is_rgb) {
3925                if (n == 1)
3926                   for (i=0; i < z->s->img_x; ++i)
3927                      *out++ = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
3928                else {
3929                   for (i=0; i < z->s->img_x; ++i, out += 2) {
3930                      out[0] = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
3931                      out[1] = 255;
3932                   }
3933                }
3934             } else if (z->s->img_n == 4 && z->app14_color_transform == 0) {
3935                for (i=0; i < z->s->img_x; ++i) {
3936                   stbi_uc m = coutput[3][i];
3937                   stbi_uc r = stbi__blinn_8x8(coutput[0][i], m);
3938                   stbi_uc g = stbi__blinn_8x8(coutput[1][i], m);
3939                   stbi_uc b = stbi__blinn_8x8(coutput[2][i], m);
3940                   out[0] = stbi__compute_y(r, g, b);
3941                   out[1] = 255;
3942                   out += n;
3943                }
3944             } else if (z->s->img_n == 4 && z->app14_color_transform == 2) {
3945                for (i=0; i < z->s->img_x; ++i) {
3946                   out[0] = stbi__blinn_8x8(255 - coutput[0][i], coutput[3][i]);
3947                   out[1] = 255;
3948                   out += n;
3949                }
3950             } else {
3951                stbi_uc *y = coutput[0];
3952                if (n == 1)
3953                   for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
3954                else
3955                   for (i=0; i < z->s->img_x; ++i) { *out++ = y[i]; *out++ = 255; }
3956             }
3957          }
3958       }
3959       stbi__cleanup_jpeg(z);
3960       *out_x = z->s->img_x;
3961       *out_y = z->s->img_y;
3962       if (comp) *comp = z->s->img_n >= 3 ? 3 : 1; // report original components, not output
3963       return output;
3964    }
3965 }
3966
3967 static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
3968 {
3969    unsigned char* result;
3970    stbi__jpeg* j = (stbi__jpeg*) stbi__malloc(sizeof(stbi__jpeg));
3971    if (!j) return stbi__errpuc("outofmem", "Out of memory");
3972    STBI_NOTUSED(ri);
3973    j->s = s;
3974    stbi__setup_jpeg(j);
3975    result = load_jpeg_image(j, x,y,comp,req_comp);
3976    STBI_FREE(j);
3977    return result;
3978 }
3979
3980 static int stbi__jpeg_test(stbi__context *s)
3981 {
3982    int r;
3983    stbi__jpeg* j = (stbi__jpeg*)stbi__malloc(sizeof(stbi__jpeg));
3984    if (!j) return stbi__err("outofmem", "Out of memory");
3985    j->s = s;
3986    stbi__setup_jpeg(j);
3987    r = stbi__decode_jpeg_header(j, STBI__SCAN_type);
3988    stbi__rewind(s);
3989    STBI_FREE(j);
3990    return r;
3991 }
3992
3993 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
3994 {
3995    if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
3996       stbi__rewind( j->s );
3997       return 0;
3998    }
3999    if (x) *x = j->s->img_x;
4000    if (y) *y = j->s->img_y;
4001    if (comp) *comp = j->s->img_n >= 3 ? 3 : 1;
4002    return 1;
4003 }
4004
4005 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
4006 {
4007    int result;
4008    stbi__jpeg* j = (stbi__jpeg*) (stbi__malloc(sizeof(stbi__jpeg)));
4009    if (!j) return stbi__err("outofmem", "Out of memory");
4010    j->s = s;
4011    result = stbi__jpeg_info_raw(j, x, y, comp);
4012    STBI_FREE(j);
4013    return result;
4014 }
4015 #endif
4016
4017 // public domain zlib decode    v0.2  Sean Barrett 2006-11-18
4018 //    simple implementation
4019 //      - all input must be provided in an upfront buffer
4020 //      - all output is written to a single output buffer (can malloc/realloc)
4021 //    performance
4022 //      - fast huffman
4023
4024 #ifndef STBI_NO_ZLIB
4025
4026 // fast-way is faster to check than jpeg huffman, but slow way is slower
4027 #define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
4028 #define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
4029 #define STBI__ZNSYMS 288 // number of symbols in literal/length alphabet
4030
4031 // zlib-style huffman encoding
4032 // (jpegs packs from left, zlib from right, so can't share code)
4033 typedef struct
4034 {
4035    stbi__uint16 fast[1 << STBI__ZFAST_BITS];
4036    stbi__uint16 firstcode[16];
4037    int maxcode[17];
4038    stbi__uint16 firstsymbol[16];
4039    stbi_uc  size[STBI__ZNSYMS];
4040    stbi__uint16 value[STBI__ZNSYMS];
4041 } stbi__zhuffman;
4042
4043 stbi_inline static int stbi__bitreverse16(int n)
4044 {
4045   n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
4046   n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
4047   n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
4048   n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
4049   return n;
4050 }
4051
4052 stbi_inline static int stbi__bit_reverse(int v, int bits)
4053 {
4054    STBI_ASSERT(bits <= 16);
4055    // to bit reverse n bits, reverse 16 and shift
4056    // e.g. 11 bits, bit reverse and shift away 5
4057    return stbi__bitreverse16(v) >> (16-bits);
4058 }
4059
4060 static int stbi__zbuild_huffman(stbi__zhuffman *z, const stbi_uc *sizelist, int num)
4061 {
4062    int i,k=0;
4063    int code, next_code[16], sizes[17];
4064
4065    // DEFLATE spec for generating codes
4066    memset(sizes, 0, sizeof(sizes));
4067    memset(z->fast, 0, sizeof(z->fast));
4068    for (i=0; i < num; ++i)
4069       ++sizes[sizelist[i]];
4070    sizes[0] = 0;
4071    for (i=1; i < 16; ++i)
4072       if (sizes[i] > (1 << i))
4073          return stbi__err("bad sizes", "Corrupt PNG");
4074    code = 0;
4075    for (i=1; i < 16; ++i) {
4076       next_code[i] = code;
4077       z->firstcode[i] = (stbi__uint16) code;
4078       z->firstsymbol[i] = (stbi__uint16) k;
4079       code = (code + sizes[i]);
4080       if (sizes[i])
4081          if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
4082       z->maxcode[i] = code << (16-i); // preshift for inner loop
4083       code <<= 1;
4084       k += sizes[i];
4085    }
4086    z->maxcode[16] = 0x10000; // sentinel
4087    for (i=0; i < num; ++i) {
4088       int s = sizelist[i];
4089       if (s) {
4090          int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
4091          stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
4092          z->size [c] = (stbi_uc     ) s;
4093          z->value[c] = (stbi__uint16) i;
4094          if (s <= STBI__ZFAST_BITS) {
4095             int j = stbi__bit_reverse(next_code[s],s);
4096             while (j < (1 << STBI__ZFAST_BITS)) {
4097                z->fast[j] = fastv;
4098                j += (1 << s);
4099             }
4100          }
4101          ++next_code[s];
4102       }
4103    }
4104    return 1;
4105 }
4106
4107 // zlib-from-memory implementation for PNG reading
4108 //    because PNG allows splitting the zlib stream arbitrarily,
4109 //    and it's annoying structurally to have PNG call ZLIB call PNG,
4110 //    we require PNG read all the IDATs and combine them into a single
4111 //    memory buffer
4112
4113 typedef struct
4114 {
4115    stbi_uc *zbuffer, *zbuffer_end;
4116    int num_bits;
4117    stbi__uint32 code_buffer;
4118
4119    char *zout;
4120    char *zout_start;
4121    char *zout_end;
4122    int   z_expandable;
4123
4124    stbi__zhuffman z_length, z_distance;
4125 } stbi__zbuf;
4126
4127 stbi_inline static int stbi__zeof(stbi__zbuf *z)
4128 {
4129    return (z->zbuffer >= z->zbuffer_end);
4130 }
4131
4132 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
4133 {
4134    return stbi__zeof(z) ? 0 : *z->zbuffer++;
4135 }
4136
4137 static void stbi__fill_bits(stbi__zbuf *z)
4138 {
4139    do {
4140       if (z->code_buffer >= (1U << z->num_bits)) {
4141         z->zbuffer = z->zbuffer_end;  /* treat this as EOF so we fail. */
4142         return;
4143       }
4144       z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
4145       z->num_bits += 8;
4146    } while (z->num_bits <= 24);
4147 }
4148
4149 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
4150 {
4151    unsigned int k;
4152    if (z->num_bits < n) stbi__fill_bits(z);
4153    k = z->code_buffer & ((1 << n) - 1);
4154    z->code_buffer >>= n;
4155    z->num_bits -= n;
4156    return k;
4157 }
4158
4159 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
4160 {
4161    int b,s,k;
4162    // not resolved by fast table, so compute it the slow way
4163    // use jpeg approach, which requires MSbits at top
4164    k = stbi__bit_reverse(a->code_buffer, 16);
4165    for (s=STBI__ZFAST_BITS+1; ; ++s)
4166       if (k < z->maxcode[s])
4167          break;
4168    if (s >= 16) return -1; // invalid code!
4169    // code size is s, so:
4170    b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
4171    if (b >= STBI__ZNSYMS) return -1; // some data was corrupt somewhere!
4172    if (z->size[b] != s) return -1;  // was originally an assert, but report failure instead.
4173    a->code_buffer >>= s;
4174    a->num_bits -= s;
4175    return z->value[b];
4176 }
4177
4178 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
4179 {
4180    int b,s;
4181    if (a->num_bits < 16) {
4182       if (stbi__zeof(a)) {
4183          return -1;   /* report error for unexpected end of data. */
4184       }
4185       stbi__fill_bits(a);
4186    }
4187    b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
4188    if (b) {
4189       s = b >> 9;
4190       a->code_buffer >>= s;
4191       a->num_bits -= s;
4192       return b & 511;
4193    }
4194    return stbi__zhuffman_decode_slowpath(a, z);
4195 }
4196
4197 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
4198 {
4199    char *q;
4200    unsigned int cur, limit, old_limit;
4201    z->zout = zout;
4202    if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
4203    cur   = (unsigned int) (z->zout - z->zout_start);
4204    limit = old_limit = (unsigned) (z->zout_end - z->zout_start);
4205    if (UINT_MAX - cur < (unsigned) n) return stbi__err("outofmem", "Out of memory");
4206    while (cur + n > limit) {
4207       if(limit > UINT_MAX / 2) return stbi__err("outofmem", "Out of memory");
4208       limit *= 2;
4209    }
4210    q = (char *) STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
4211    STBI_NOTUSED(old_limit);
4212    if (q == NULL) return stbi__err("outofmem", "Out of memory");
4213    z->zout_start = q;
4214    z->zout       = q + cur;
4215    z->zout_end   = q + limit;
4216    return 1;
4217 }
4218
4219 static const int stbi__zlength_base[31] = {
4220    3,4,5,6,7,8,9,10,11,13,
4221    15,17,19,23,27,31,35,43,51,59,
4222    67,83,99,115,131,163,195,227,258,0,0 };
4223
4224 static const int stbi__zlength_extra[31]=
4225 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
4226
4227 static const int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
4228 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
4229
4230 static const int stbi__zdist_extra[32] =
4231 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
4232
4233 static int stbi__parse_huffman_block(stbi__zbuf *a)
4234 {
4235    char *zout = a->zout;
4236    for(;;) {
4237       int z = stbi__zhuffman_decode(a, &a->z_length);
4238       if (z < 256) {
4239          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
4240          if (zout >= a->zout_end) {
4241             if (!stbi__zexpand(a, zout, 1)) return 0;
4242             zout = a->zout;
4243          }
4244          *zout++ = (char) z;
4245       } else {
4246          stbi_uc *p;
4247          int len,dist;
4248          if (z == 256) {
4249             a->zout = zout;
4250             return 1;
4251          }
4252          z -= 257;
4253          len = stbi__zlength_base[z];
4254          if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
4255          z = stbi__zhuffman_decode(a, &a->z_distance);
4256          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
4257          dist = stbi__zdist_base[z];
4258          if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
4259          if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
4260          if (zout + len > a->zout_end) {
4261             if (!stbi__zexpand(a, zout, len)) return 0;
4262             zout = a->zout;
4263          }
4264          p = (stbi_uc *) (zout - dist);
4265          if (dist == 1) { // run of one byte; common in images.
4266             stbi_uc v = *p;
4267             if (len) { do *zout++ = v; while (--len); }
4268          } else {
4269             if (len) { do *zout++ = *p++; while (--len); }
4270          }
4271       }
4272    }
4273 }
4274
4275 static int stbi__compute_huffman_codes(stbi__zbuf *a)
4276 {
4277    static const stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
4278    stbi__zhuffman z_codelength;
4279    stbi_uc lencodes[286+32+137];//padding for maximum single op
4280    stbi_uc codelength_sizes[19];
4281    int i,n;
4282
4283    int hlit  = stbi__zreceive(a,5) + 257;
4284    int hdist = stbi__zreceive(a,5) + 1;
4285    int hclen = stbi__zreceive(a,4) + 4;
4286    int ntot  = hlit + hdist;
4287
4288    memset(codelength_sizes, 0, sizeof(codelength_sizes));
4289    for (i=0; i < hclen; ++i) {
4290       int s = stbi__zreceive(a,3);
4291       codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
4292    }
4293    if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
4294
4295    n = 0;
4296    while (n < ntot) {
4297       int c = stbi__zhuffman_decode(a, &z_codelength);
4298       if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
4299       if (c < 16)
4300          lencodes[n++] = (stbi_uc) c;
4301       else {
4302          stbi_uc fill = 0;
4303          if (c == 16) {
4304             c = stbi__zreceive(a,2)+3;
4305             if (n == 0) return stbi__err("bad codelengths", "Corrupt PNG");
4306             fill = lencodes[n-1];
4307          } else if (c == 17) {
4308             c = stbi__zreceive(a,3)+3;
4309          } else if (c == 18) {
4310             c = stbi__zreceive(a,7)+11;
4311          } else {
4312             return stbi__err("bad codelengths", "Corrupt PNG");
4313          }
4314          if (ntot - n < c) return stbi__err("bad codelengths", "Corrupt PNG");
4315          memset(lencodes+n, fill, c);
4316          n += c;
4317       }
4318    }
4319    if (n != ntot) return stbi__err("bad codelengths","Corrupt PNG");
4320    if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
4321    if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
4322    return 1;
4323 }
4324
4325 static int stbi__parse_uncompressed_block(stbi__zbuf *a)
4326 {
4327    stbi_uc header[4];
4328    int len,nlen,k;
4329    if (a->num_bits & 7)
4330       stbi__zreceive(a, a->num_bits & 7); // discard
4331    // drain the bit-packed data into header
4332    k = 0;
4333    while (a->num_bits > 0) {
4334       header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
4335       a->code_buffer >>= 8;
4336       a->num_bits -= 8;
4337    }
4338    if (a->num_bits < 0) return stbi__err("zlib corrupt","Corrupt PNG");
4339    // now fill header the normal way
4340    while (k < 4)
4341       header[k++] = stbi__zget8(a);
4342    len  = header[1] * 256 + header[0];
4343    nlen = header[3] * 256 + header[2];
4344    if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
4345    if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
4346    if (a->zout + len > a->zout_end)
4347       if (!stbi__zexpand(a, a->zout, len)) return 0;
4348    memcpy(a->zout, a->zbuffer, len);
4349    a->zbuffer += len;
4350    a->zout += len;
4351    return 1;
4352 }
4353
4354 static int stbi__parse_zlib_header(stbi__zbuf *a)
4355 {
4356    int cmf   = stbi__zget8(a);
4357    int cm    = cmf & 15;
4358    /* int cinfo = cmf >> 4; */
4359    int flg   = stbi__zget8(a);
4360    if (stbi__zeof(a)) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
4361    if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
4362    if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
4363    if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
4364    // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
4365    return 1;
4366 }
4367
4368 static const stbi_uc stbi__zdefault_length[STBI__ZNSYMS] =
4369 {
4370    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4371    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4372    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4373    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4374    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4375    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4376    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4377    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4378    7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8
4379 };
4380 static const stbi_uc stbi__zdefault_distance[32] =
4381 {
4382    5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
4383 };
4384 /*
4385 Init algorithm:
4386 {
4387    int i;   // use <= to match clearly with spec
4388    for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
4389    for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
4390    for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
4391    for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
4392
4393    for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
4394 }
4395 */
4396
4397 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
4398 {
4399    int final, type;
4400    if (parse_header)
4401       if (!stbi__parse_zlib_header(a)) return 0;
4402    a->num_bits = 0;
4403    a->code_buffer = 0;
4404    do {
4405       final = stbi__zreceive(a,1);
4406       type = stbi__zreceive(a,2);
4407       if (type == 0) {
4408          if (!stbi__parse_uncompressed_block(a)) return 0;
4409       } else if (type == 3) {
4410          return 0;
4411       } else {
4412          if (type == 1) {
4413             // use fixed code lengths
4414             if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , STBI__ZNSYMS)) return 0;
4415             if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
4416          } else {
4417             if (!stbi__compute_huffman_codes(a)) return 0;
4418          }
4419          if (!stbi__parse_huffman_block(a)) return 0;
4420       }
4421    } while (!final);
4422    return 1;
4423 }
4424
4425 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
4426 {
4427    a->zout_start = obuf;
4428    a->zout       = obuf;
4429    a->zout_end   = obuf + olen;
4430    a->z_expandable = exp;
4431
4432    return stbi__parse_zlib(a, parse_header);
4433 }
4434
4435 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
4436 {
4437    stbi__zbuf a;
4438    char *p = (char *) stbi__malloc(initial_size);
4439    if (p == NULL) return NULL;
4440    a.zbuffer = (stbi_uc *) buffer;
4441    a.zbuffer_end = (stbi_uc *) buffer + len;
4442    if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
4443       if (outlen) *outlen = (int) (a.zout - a.zout_start);
4444       return a.zout_start;
4445    } else {
4446       STBI_FREE(a.zout_start);
4447       return NULL;
4448    }
4449 }
4450
4451 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
4452 {
4453    return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
4454 }
4455
4456 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
4457 {
4458    stbi__zbuf a;
4459    char *p = (char *) stbi__malloc(initial_size);
4460    if (p == NULL) return NULL;
4461    a.zbuffer = (stbi_uc *) buffer;
4462    a.zbuffer_end = (stbi_uc *) buffer + len;
4463    if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
4464       if (outlen) *outlen = (int) (a.zout - a.zout_start);
4465       return a.zout_start;
4466    } else {
4467       STBI_FREE(a.zout_start);
4468       return NULL;
4469    }
4470 }
4471
4472 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
4473 {
4474    stbi__zbuf a;
4475    a.zbuffer = (stbi_uc *) ibuffer;
4476    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
4477    if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
4478       return (int) (a.zout - a.zout_start);
4479    else
4480       return -1;
4481 }
4482
4483 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
4484 {
4485    stbi__zbuf a;
4486    char *p = (char *) stbi__malloc(16384);
4487    if (p == NULL) return NULL;
4488    a.zbuffer = (stbi_uc *) buffer;
4489    a.zbuffer_end = (stbi_uc *) buffer+len;
4490    if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
4491       if (outlen) *outlen = (int) (a.zout - a.zout_start);
4492       return a.zout_start;
4493    } else {
4494       STBI_FREE(a.zout_start);
4495       return NULL;
4496    }
4497 }
4498
4499 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
4500 {
4501    stbi__zbuf a;
4502    a.zbuffer = (stbi_uc *) ibuffer;
4503    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
4504    if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
4505       return (int) (a.zout - a.zout_start);
4506    else
4507       return -1;
4508 }
4509 #endif
4510
4511 // public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
4512 //    simple implementation
4513 //      - only 8-bit samples
4514 //      - no CRC checking
4515 //      - allocates lots of intermediate memory
4516 //        - avoids problem of streaming data between subsystems
4517 //        - avoids explicit window management
4518 //    performance
4519 //      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
4520
4521 #ifndef STBI_NO_PNG
4522 typedef struct
4523 {
4524    stbi__uint32 length;
4525    stbi__uint32 type;
4526 } stbi__pngchunk;
4527
4528 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
4529 {
4530    stbi__pngchunk c;
4531    c.length = stbi__get32be(s);
4532    c.type   = stbi__get32be(s);
4533    return c;
4534 }
4535
4536 static int stbi__check_png_header(stbi__context *s)
4537 {
4538    static const stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
4539    int i;
4540    for (i=0; i < 8; ++i)
4541       if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
4542    return 1;
4543 }
4544
4545 typedef struct
4546 {
4547    stbi__context *s;
4548    stbi_uc *idata, *expanded, *out;
4549    int depth;
4550 } stbi__png;
4551
4552
4553 enum {
4554    STBI__F_none=0,
4555    STBI__F_sub=1,
4556    STBI__F_up=2,
4557    STBI__F_avg=3,
4558    STBI__F_paeth=4,
4559    // synthetic filters used for first scanline to avoid needing a dummy row of 0s
4560    STBI__F_avg_first,
4561    STBI__F_paeth_first
4562 };
4563
4564 static stbi_uc first_row_filter[5] =
4565 {
4566    STBI__F_none,
4567    STBI__F_sub,
4568    STBI__F_none,
4569    STBI__F_avg_first,
4570    STBI__F_paeth_first
4571 };
4572
4573 static int stbi__paeth(int a, int b, int c)
4574 {
4575    int p = a + b - c;
4576    int pa = abs(p-a);
4577    int pb = abs(p-b);
4578    int pc = abs(p-c);
4579    if (pa <= pb && pa <= pc) return a;
4580    if (pb <= pc) return b;
4581    return c;
4582 }
4583
4584 static const stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
4585
4586 // create the png data from post-deflated data
4587 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
4588 {
4589    int bytes = (depth == 16? 2 : 1);
4590    stbi__context *s = a->s;
4591    stbi__uint32 i,j,stride = x*out_n*bytes;
4592    stbi__uint32 img_len, img_width_bytes;
4593    int k;
4594    int img_n = s->img_n; // copy it into a local for later
4595
4596    int output_bytes = out_n*bytes;
4597    int filter_bytes = img_n*bytes;
4598    int width = x;
4599
4600    STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
4601    a->out = (stbi_uc *) stbi__malloc_mad3(x, y, output_bytes, 0); // extra bytes to write off the end into
4602    if (!a->out) return stbi__err("outofmem", "Out of memory");
4603
4604    if (!stbi__mad3sizes_valid(img_n, x, depth, 7)) return stbi__err("too large", "Corrupt PNG");
4605    img_width_bytes = (((img_n * x * depth) + 7) >> 3);
4606    img_len = (img_width_bytes + 1) * y;
4607
4608    // we used to check for exact match between raw_len and img_len on non-interlaced PNGs,
4609    // but issue #276 reported a PNG in the wild that had extra data at the end (all zeros),
4610    // so just check for raw_len < img_len always.
4611    if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
4612
4613    for (j=0; j < y; ++j) {
4614       stbi_uc *cur = a->out + stride*j;
4615       stbi_uc *prior;
4616       int filter = *raw++;
4617
4618       if (filter > 4)
4619          return stbi__err("invalid filter","Corrupt PNG");
4620
4621       if (depth < 8) {
4622          if (img_width_bytes > x) return stbi__err("invalid width","Corrupt PNG");
4623          cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
4624          filter_bytes = 1;
4625          width = img_width_bytes;
4626       }
4627       prior = cur - stride; // bugfix: need to compute this after 'cur +=' computation above
4628
4629       // if first row, use special filter that doesn't sample previous row
4630       if (j == 0) filter = first_row_filter[filter];
4631
4632       // handle first byte explicitly
4633       for (k=0; k < filter_bytes; ++k) {
4634          switch (filter) {
4635             case STBI__F_none       : cur[k] = raw[k]; break;
4636             case STBI__F_sub        : cur[k] = raw[k]; break;
4637             case STBI__F_up         : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4638             case STBI__F_avg        : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
4639             case STBI__F_paeth      : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
4640             case STBI__F_avg_first  : cur[k] = raw[k]; break;
4641             case STBI__F_paeth_first: cur[k] = raw[k]; break;
4642          }
4643       }
4644
4645       if (depth == 8) {
4646          if (img_n != out_n)
4647             cur[img_n] = 255; // first pixel
4648          raw += img_n;
4649          cur += out_n;
4650          prior += out_n;
4651       } else if (depth == 16) {
4652          if (img_n != out_n) {
4653             cur[filter_bytes]   = 255; // first pixel top byte
4654             cur[filter_bytes+1] = 255; // first pixel bottom byte
4655          }
4656          raw += filter_bytes;
4657          cur += output_bytes;
4658          prior += output_bytes;
4659       } else {
4660          raw += 1;
4661          cur += 1;
4662          prior += 1;
4663       }
4664
4665       // this is a little gross, so that we don't switch per-pixel or per-component
4666       if (depth < 8 || img_n == out_n) {
4667          int nk = (width - 1)*filter_bytes;
4668          #define STBI__CASE(f) \
4669              case f:     \
4670                 for (k=0; k < nk; ++k)
4671          switch (filter) {
4672             // "none" filter turns into a memcpy here; make that explicit.
4673             case STBI__F_none:         memcpy(cur, raw, nk); break;
4674             STBI__CASE(STBI__F_sub)          { cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); } break;
4675             STBI__CASE(STBI__F_up)           { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
4676             STBI__CASE(STBI__F_avg)          { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); } break;
4677             STBI__CASE(STBI__F_paeth)        { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); } break;
4678             STBI__CASE(STBI__F_avg_first)    { cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); } break;
4679             STBI__CASE(STBI__F_paeth_first)  { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); } break;
4680          }
4681          #undef STBI__CASE
4682          raw += nk;
4683       } else {
4684          STBI_ASSERT(img_n+1 == out_n);
4685          #define STBI__CASE(f) \
4686              case f:     \
4687                 for (i=x-1; i >= 1; --i, cur[filter_bytes]=255,raw+=filter_bytes,cur+=output_bytes,prior+=output_bytes) \
4688                    for (k=0; k < filter_bytes; ++k)
4689          switch (filter) {
4690             STBI__CASE(STBI__F_none)         { cur[k] = raw[k]; } break;
4691             STBI__CASE(STBI__F_sub)          { cur[k] = STBI__BYTECAST(raw[k] + cur[k- output_bytes]); } break;
4692             STBI__CASE(STBI__F_up)           { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
4693             STBI__CASE(STBI__F_avg)          { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k- output_bytes])>>1)); } break;
4694             STBI__CASE(STBI__F_paeth)        { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],prior[k],prior[k- output_bytes])); } break;
4695             STBI__CASE(STBI__F_avg_first)    { cur[k] = STBI__BYTECAST(raw[k] + (cur[k- output_bytes] >> 1)); } break;
4696             STBI__CASE(STBI__F_paeth_first)  { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],0,0)); } break;
4697          }
4698          #undef STBI__CASE
4699
4700          // the loop above sets the high byte of the pixels' alpha, but for
4701          // 16 bit png files we also need the low byte set. we'll do that here.
4702          if (depth == 16) {
4703             cur = a->out + stride*j; // start at the beginning of the row again
4704             for (i=0; i < x; ++i,cur+=output_bytes) {
4705                cur[filter_bytes+1] = 255;
4706             }
4707          }
4708       }
4709    }
4710
4711    // we make a separate pass to expand bits to pixels; for performance,
4712    // this could run two scanlines behind the above code, so it won't
4713    // intefere with filtering but will still be in the cache.
4714    if (depth < 8) {
4715       for (j=0; j < y; ++j) {
4716          stbi_uc *cur = a->out + stride*j;
4717          stbi_uc *in  = a->out + stride*j + x*out_n - img_width_bytes;
4718          // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
4719          // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
4720          stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
4721
4722          // note that the final byte might overshoot and write more data than desired.
4723          // we can allocate enough data that this never writes out of memory, but it
4724          // could also overwrite the next scanline. can it overwrite non-empty data
4725          // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
4726          // so we need to explicitly clamp the final ones
4727
4728          if (depth == 4) {
4729             for (k=x*img_n; k >= 2; k-=2, ++in) {
4730                *cur++ = scale * ((*in >> 4)       );
4731                *cur++ = scale * ((*in     ) & 0x0f);
4732             }
4733             if (k > 0) *cur++ = scale * ((*in >> 4)       );
4734          } else if (depth == 2) {
4735             for (k=x*img_n; k >= 4; k-=4, ++in) {
4736                *cur++ = scale * ((*in >> 6)       );
4737                *cur++ = scale * ((*in >> 4) & 0x03);
4738                *cur++ = scale * ((*in >> 2) & 0x03);
4739                *cur++ = scale * ((*in     ) & 0x03);
4740             }
4741             if (k > 0) *cur++ = scale * ((*in >> 6)       );
4742             if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
4743             if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
4744          } else if (depth == 1) {
4745             for (k=x*img_n; k >= 8; k-=8, ++in) {
4746                *cur++ = scale * ((*in >> 7)       );
4747                *cur++ = scale * ((*in >> 6) & 0x01);
4748                *cur++ = scale * ((*in >> 5) & 0x01);
4749                *cur++ = scale * ((*in >> 4) & 0x01);
4750                *cur++ = scale * ((*in >> 3) & 0x01);
4751                *cur++ = scale * ((*in >> 2) & 0x01);
4752                *cur++ = scale * ((*in >> 1) & 0x01);
4753                *cur++ = scale * ((*in     ) & 0x01);
4754             }
4755             if (k > 0) *cur++ = scale * ((*in >> 7)       );
4756             if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
4757             if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
4758             if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
4759             if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
4760             if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
4761             if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
4762          }
4763          if (img_n != out_n) {
4764             int q;
4765             // insert alpha = 255
4766             cur = a->out + stride*j;
4767             if (img_n == 1) {
4768                for (q=x-1; q >= 0; --q) {
4769                   cur[q*2+1] = 255;
4770                   cur[q*2+0] = cur[q];
4771                }
4772             } else {
4773                STBI_ASSERT(img_n == 3);
4774                for (q=x-1; q >= 0; --q) {
4775                   cur[q*4+3] = 255;
4776                   cur[q*4+2] = cur[q*3+2];
4777                   cur[q*4+1] = cur[q*3+1];
4778                   cur[q*4+0] = cur[q*3+0];
4779                }
4780             }
4781          }
4782       }
4783    } else if (depth == 16) {
4784       // force the image data from big-endian to platform-native.
4785       // this is done in a separate pass due to the decoding relying
4786       // on the data being untouched, but could probably be done
4787       // per-line during decode if care is taken.
4788       stbi_uc *cur = a->out;
4789       stbi__uint16 *cur16 = (stbi__uint16*)cur;
4790
4791       for(i=0; i < x*y*out_n; ++i,cur16++,cur+=2) {
4792          *cur16 = (cur[0] << 8) | cur[1];
4793       }
4794    }
4795
4796    return 1;
4797 }
4798
4799 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
4800 {
4801    int bytes = (depth == 16 ? 2 : 1);
4802    int out_bytes = out_n * bytes;
4803    stbi_uc *final;
4804    int p;
4805    if (!interlaced)
4806       return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
4807
4808    // de-interlacing
4809    final = (stbi_uc *) stbi__malloc_mad3(a->s->img_x, a->s->img_y, out_bytes, 0);
4810    if (!final) return stbi__err("outofmem", "Out of memory");
4811    for (p=0; p < 7; ++p) {
4812       int xorig[] = { 0,4,0,2,0,1,0 };
4813       int yorig[] = { 0,0,4,0,2,0,1 };
4814       int xspc[]  = { 8,8,4,4,2,2,1 };
4815       int yspc[]  = { 8,8,8,4,4,2,2 };
4816       int i,j,x,y;
4817       // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
4818       x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
4819       y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
4820       if (x && y) {
4821          stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
4822          if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
4823             STBI_FREE(final);
4824             return 0;
4825          }
4826          for (j=0; j < y; ++j) {
4827             for (i=0; i < x; ++i) {
4828                int out_y = j*yspc[p]+yorig[p];
4829                int out_x = i*xspc[p]+xorig[p];
4830                memcpy(final + out_y*a->s->img_x*out_bytes + out_x*out_bytes,
4831                       a->out + (j*x+i)*out_bytes, out_bytes);
4832             }
4833          }
4834          STBI_FREE(a->out);
4835          image_data += img_len;
4836          image_data_len -= img_len;
4837       }
4838    }
4839    a->out = final;
4840
4841    return 1;
4842 }
4843
4844 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
4845 {
4846    stbi__context *s = z->s;
4847    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4848    stbi_uc *p = z->out;
4849
4850    // compute color-based transparency, assuming we've
4851    // already got 255 as the alpha value in the output
4852    STBI_ASSERT(out_n == 2 || out_n == 4);
4853
4854    if (out_n == 2) {
4855       for (i=0; i < pixel_count; ++i) {
4856          p[1] = (p[0] == tc[0] ? 0 : 255);
4857          p += 2;
4858       }
4859    } else {
4860       for (i=0; i < pixel_count; ++i) {
4861          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4862             p[3] = 0;
4863          p += 4;
4864       }
4865    }
4866    return 1;
4867 }
4868
4869 static int stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n)
4870 {
4871    stbi__context *s = z->s;
4872    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4873    stbi__uint16 *p = (stbi__uint16*) z->out;
4874
4875    // compute color-based transparency, assuming we've
4876    // already got 65535 as the alpha value in the output
4877    STBI_ASSERT(out_n == 2 || out_n == 4);
4878
4879    if (out_n == 2) {
4880       for (i = 0; i < pixel_count; ++i) {
4881          p[1] = (p[0] == tc[0] ? 0 : 65535);
4882          p += 2;
4883       }
4884    } else {
4885       for (i = 0; i < pixel_count; ++i) {
4886          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4887             p[3] = 0;
4888          p += 4;
4889       }
4890    }
4891    return 1;
4892 }
4893
4894 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
4895 {
4896    stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
4897    stbi_uc *p, *temp_out, *orig = a->out;
4898
4899    p = (stbi_uc *) stbi__malloc_mad2(pixel_count, pal_img_n, 0);
4900    if (p == NULL) return stbi__err("outofmem", "Out of memory");
4901
4902    // between here and free(out) below, exitting would leak
4903    temp_out = p;
4904
4905    if (pal_img_n == 3) {
4906       for (i=0; i < pixel_count; ++i) {
4907          int n = orig[i]*4;
4908          p[0] = palette[n  ];
4909          p[1] = palette[n+1];
4910          p[2] = palette[n+2];
4911          p += 3;
4912       }
4913    } else {
4914       for (i=0; i < pixel_count; ++i) {
4915          int n = orig[i]*4;
4916          p[0] = palette[n  ];
4917          p[1] = palette[n+1];
4918          p[2] = palette[n+2];
4919          p[3] = palette[n+3];
4920          p += 4;
4921       }
4922    }
4923    STBI_FREE(a->out);
4924    a->out = temp_out;
4925
4926    STBI_NOTUSED(len);
4927
4928    return 1;
4929 }
4930
4931 static int stbi__unpremultiply_on_load_global = 0;
4932 static int stbi__de_iphone_flag_global = 0;
4933
4934 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
4935 {
4936    stbi__unpremultiply_on_load_global = flag_true_if_should_unpremultiply;
4937 }
4938
4939 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
4940 {
4941    stbi__de_iphone_flag_global = flag_true_if_should_convert;
4942 }
4943
4944 #ifndef STBI_THREAD_LOCAL
4945 #define stbi__unpremultiply_on_load  stbi__unpremultiply_on_load_global
4946 #define stbi__de_iphone_flag  stbi__de_iphone_flag_global
4947 #else
4948 static STBI_THREAD_LOCAL int stbi__unpremultiply_on_load_local, stbi__unpremultiply_on_load_set;
4949 static STBI_THREAD_LOCAL int stbi__de_iphone_flag_local, stbi__de_iphone_flag_set;
4950
4951 STBIDEF void stbi__unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply)
4952 {
4953    stbi__unpremultiply_on_load_local = flag_true_if_should_unpremultiply;
4954    stbi__unpremultiply_on_load_set = 1;
4955 }
4956
4957 STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert)
4958 {
4959    stbi__de_iphone_flag_local = flag_true_if_should_convert;
4960    stbi__de_iphone_flag_set = 1;
4961 }
4962
4963 #define stbi__unpremultiply_on_load  (stbi__unpremultiply_on_load_set           \
4964                                        ? stbi__unpremultiply_on_load_local      \
4965                                        : stbi__unpremultiply_on_load_global)
4966 #define stbi__de_iphone_flag  (stbi__de_iphone_flag_set                         \
4967                                 ? stbi__de_iphone_flag_local                    \
4968                                 : stbi__de_iphone_flag_global)
4969 #endif // STBI_THREAD_LOCAL
4970
4971 static void stbi__de_iphone(stbi__png *z)
4972 {
4973    stbi__context *s = z->s;
4974    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4975    stbi_uc *p = z->out;
4976
4977    if (s->img_out_n == 3) {  // convert bgr to rgb
4978       for (i=0; i < pixel_count; ++i) {
4979          stbi_uc t = p[0];
4980          p[0] = p[2];
4981          p[2] = t;
4982          p += 3;
4983       }
4984    } else {
4985       STBI_ASSERT(s->img_out_n == 4);
4986       if (stbi__unpremultiply_on_load) {
4987          // convert bgr to rgb and unpremultiply
4988          for (i=0; i < pixel_count; ++i) {
4989             stbi_uc a = p[3];
4990             stbi_uc t = p[0];
4991             if (a) {
4992                stbi_uc half = a / 2;
4993                p[0] = (p[2] * 255 + half) / a;
4994                p[1] = (p[1] * 255 + half) / a;
4995                p[2] = ( t   * 255 + half) / a;
4996             } else {
4997                p[0] = p[2];
4998                p[2] = t;
4999             }
5000             p += 4;
5001          }
5002       } else {
5003          // convert bgr to rgb
5004          for (i=0; i < pixel_count; ++i) {
5005             stbi_uc t = p[0];
5006             p[0] = p[2];
5007             p[2] = t;
5008             p += 4;
5009          }
5010       }
5011    }
5012 }
5013
5014 #define STBI__PNG_TYPE(a,b,c,d)  (((unsigned) (a) << 24) + ((unsigned) (b) << 16) + ((unsigned) (c) << 8) + (unsigned) (d))
5015
5016 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
5017 {
5018    stbi_uc palette[1024], pal_img_n=0;
5019    stbi_uc has_trans=0, tc[3]={0};
5020    stbi__uint16 tc16[3];
5021    stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
5022    int first=1,k,interlace=0, color=0, is_iphone=0;
5023    stbi__context *s = z->s;
5024
5025    z->expanded = NULL;
5026    z->idata = NULL;
5027    z->out = NULL;
5028
5029    if (!stbi__check_png_header(s)) return 0;
5030
5031    if (scan == STBI__SCAN_type) return 1;
5032
5033    for (;;) {
5034       stbi__pngchunk c = stbi__get_chunk_header(s);
5035       switch (c.type) {
5036          case STBI__PNG_TYPE('C','g','B','I'):
5037             is_iphone = 1;
5038             stbi__skip(s, c.length);
5039             break;
5040          case STBI__PNG_TYPE('I','H','D','R'): {
5041             int comp,filter;
5042             if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
5043             first = 0;
5044             if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
5045             s->img_x = stbi__get32be(s);
5046             s->img_y = stbi__get32be(s);
5047             if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
5048             if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
5049             z->depth = stbi__get8(s);  if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16)  return stbi__err("1/2/4/8/16-bit only","PNG not supported: 1/2/4/8/16-bit only");
5050             color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
5051             if (color == 3 && z->depth == 16)                  return stbi__err("bad ctype","Corrupt PNG");
5052             if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
5053             comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
5054             filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
5055             interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
5056             if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
5057             if (!pal_img_n) {
5058                s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
5059                if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
5060                if (scan == STBI__SCAN_header) return 1;
5061             } else {
5062                // if paletted, then pal_n is our final components, and
5063                // img_n is # components to decompress/filter.
5064                s->img_n = 1;
5065                if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
5066                // if SCAN_header, have to scan to see if we have a tRNS
5067             }
5068             break;
5069          }
5070
5071          case STBI__PNG_TYPE('P','L','T','E'):  {
5072             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5073             if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
5074             pal_len = c.length / 3;
5075             if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
5076             for (i=0; i < pal_len; ++i) {
5077                palette[i*4+0] = stbi__get8(s);
5078                palette[i*4+1] = stbi__get8(s);
5079                palette[i*4+2] = stbi__get8(s);
5080                palette[i*4+3] = 255;
5081             }
5082             break;
5083          }
5084
5085          case STBI__PNG_TYPE('t','R','N','S'): {
5086             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5087             if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
5088             if (pal_img_n) {
5089                if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
5090                if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
5091                if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
5092                pal_img_n = 4;
5093                for (i=0; i < c.length; ++i)
5094                   palette[i*4+3] = stbi__get8(s);
5095             } else {
5096                if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
5097                if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
5098                has_trans = 1;
5099                if (z->depth == 16) {
5100                   for (k = 0; k < s->img_n; ++k) tc16[k] = (stbi__uint16)stbi__get16be(s); // copy the values as-is
5101                } else {
5102                   for (k = 0; k < s->img_n; ++k) tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger
5103                }
5104             }
5105             break;
5106          }
5107
5108          case STBI__PNG_TYPE('I','D','A','T'): {
5109             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5110             if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
5111             if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
5112             if ((int)(ioff + c.length) < (int)ioff) return 0;
5113             if (ioff + c.length > idata_limit) {
5114                stbi__uint32 idata_limit_old = idata_limit;
5115                stbi_uc *p;
5116                if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
5117                while (ioff + c.length > idata_limit)
5118                   idata_limit *= 2;
5119                STBI_NOTUSED(idata_limit_old);
5120                p = (stbi_uc *) STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
5121                z->idata = p;
5122             }
5123             if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
5124             ioff += c.length;
5125             break;
5126          }
5127
5128          case STBI__PNG_TYPE('I','E','N','D'): {
5129             stbi__uint32 raw_len, bpl;
5130             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5131             if (scan != STBI__SCAN_load) return 1;
5132             if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
5133             // initial guess for decoded data size to avoid unnecessary reallocs
5134             bpl = (s->img_x * z->depth + 7) / 8; // bytes per line, per component
5135             raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
5136             z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
5137             if (z->expanded == NULL) return 0; // zlib should set error
5138             STBI_FREE(z->idata); z->idata = NULL;
5139             if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
5140                s->img_out_n = s->img_n+1;
5141             else
5142                s->img_out_n = s->img_n;
5143             if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, z->depth, color, interlace)) return 0;
5144             if (has_trans) {
5145                if (z->depth == 16) {
5146                   if (!stbi__compute_transparency16(z, tc16, s->img_out_n)) return 0;
5147                } else {
5148                   if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
5149                }
5150             }
5151             if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
5152                stbi__de_iphone(z);
5153             if (pal_img_n) {
5154                // pal_img_n == 3 or 4
5155                s->img_n = pal_img_n; // record the actual colors we had
5156                s->img_out_n = pal_img_n;
5157                if (req_comp >= 3) s->img_out_n = req_comp;
5158                if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
5159                   return 0;
5160             } else if (has_trans) {
5161                // non-paletted image with tRNS -> source image has (constant) alpha
5162                ++s->img_n;
5163             }
5164             STBI_FREE(z->expanded); z->expanded = NULL;
5165             // end of PNG chunk, read and skip CRC
5166             stbi__get32be(s);
5167             return 1;
5168          }
5169
5170          default:
5171             // if critical, fail
5172             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5173             if ((c.type & (1 << 29)) == 0) {
5174                #ifndef STBI_NO_FAILURE_STRINGS
5175                // not threadsafe
5176                static char invalid_chunk[] = "XXXX PNG chunk not known";
5177                invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
5178                invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
5179                invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
5180                invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
5181                #endif
5182                return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
5183             }
5184             stbi__skip(s, c.length);
5185             break;
5186       }
5187       // end of PNG chunk, read and skip CRC
5188       stbi__get32be(s);
5189    }
5190 }
5191
5192 static void *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp, stbi__result_info *ri)
5193 {
5194    void *result=NULL;
5195    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
5196    if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
5197       if (p->depth <= 8)
5198          ri->bits_per_channel = 8;
5199       else if (p->depth == 16)
5200          ri->bits_per_channel = 16;
5201       else
5202          return stbi__errpuc("bad bits_per_channel", "PNG not supported: unsupported color depth");
5203       result = p->out;
5204       p->out = NULL;
5205       if (req_comp && req_comp != p->s->img_out_n) {
5206          if (ri->bits_per_channel == 8)
5207             result = stbi__convert_format((unsigned char *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
5208          else
5209             result = stbi__convert_format16((stbi__uint16 *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
5210          p->s->img_out_n = req_comp;
5211          if (result == NULL) return result;
5212       }
5213       *x = p->s->img_x;
5214       *y = p->s->img_y;
5215       if (n) *n = p->s->img_n;
5216    }
5217    STBI_FREE(p->out);      p->out      = NULL;
5218    STBI_FREE(p->expanded); p->expanded = NULL;
5219    STBI_FREE(p->idata);    p->idata    = NULL;
5220
5221    return result;
5222 }
5223
5224 static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5225 {
5226    stbi__png p;
5227    p.s = s;
5228    return stbi__do_png(&p, x,y,comp,req_comp, ri);
5229 }
5230
5231 static int stbi__png_test(stbi__context *s)
5232 {
5233    int r;
5234    r = stbi__check_png_header(s);
5235    stbi__rewind(s);
5236    return r;
5237 }
5238
5239 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
5240 {
5241    if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
5242       stbi__rewind( p->s );
5243       return 0;
5244    }
5245    if (x) *x = p->s->img_x;
5246    if (y) *y = p->s->img_y;
5247    if (comp) *comp = p->s->img_n;
5248    return 1;
5249 }
5250
5251 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
5252 {
5253    stbi__png p;
5254    p.s = s;
5255    return stbi__png_info_raw(&p, x, y, comp);
5256 }
5257
5258 static int stbi__png_is16(stbi__context *s)
5259 {
5260    stbi__png p;
5261    p.s = s;
5262    if (!stbi__png_info_raw(&p, NULL, NULL, NULL))
5263            return 0;
5264    if (p.depth != 16) {
5265       stbi__rewind(p.s);
5266       return 0;
5267    }
5268    return 1;
5269 }
5270 #endif
5271
5272 // Microsoft/Windows BMP image
5273
5274 #ifndef STBI_NO_BMP
5275 static int stbi__bmp_test_raw(stbi__context *s)
5276 {
5277    int r;
5278    int sz;
5279    if (stbi__get8(s) != 'B') return 0;
5280    if (stbi__get8(s) != 'M') return 0;
5281    stbi__get32le(s); // discard filesize
5282    stbi__get16le(s); // discard reserved
5283    stbi__get16le(s); // discard reserved
5284    stbi__get32le(s); // discard data offset
5285    sz = stbi__get32le(s);
5286    r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
5287    return r;
5288 }
5289
5290 static int stbi__bmp_test(stbi__context *s)
5291 {
5292    int r = stbi__bmp_test_raw(s);
5293    stbi__rewind(s);
5294    return r;
5295 }
5296
5297
5298 // returns 0..31 for the highest set bit
5299 static int stbi__high_bit(unsigned int z)
5300 {
5301    int n=0;
5302    if (z == 0) return -1;
5303    if (z >= 0x10000) { n += 16; z >>= 16; }
5304    if (z >= 0x00100) { n +=  8; z >>=  8; }
5305    if (z >= 0x00010) { n +=  4; z >>=  4; }
5306    if (z >= 0x00004) { n +=  2; z >>=  2; }
5307    if (z >= 0x00002) { n +=  1;/* >>=  1;*/ }
5308    return n;
5309 }
5310
5311 static int stbi__bitcount(unsigned int a)
5312 {
5313    a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
5314    a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
5315    a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
5316    a = (a + (a >> 8)); // max 16 per 8 bits
5317    a = (a + (a >> 16)); // max 32 per 8 bits
5318    return a & 0xff;
5319 }
5320
5321 // extract an arbitrarily-aligned N-bit value (N=bits)
5322 // from v, and then make it 8-bits long and fractionally
5323 // extend it to full full range.
5324 static int stbi__shiftsigned(unsigned int v, int shift, int bits)
5325 {
5326    static unsigned int mul_table[9] = {
5327       0,
5328       0xff/*0b11111111*/, 0x55/*0b01010101*/, 0x49/*0b01001001*/, 0x11/*0b00010001*/,
5329       0x21/*0b00100001*/, 0x41/*0b01000001*/, 0x81/*0b10000001*/, 0x01/*0b00000001*/,
5330    };
5331    static unsigned int shift_table[9] = {
5332       0, 0,0,1,0,2,4,6,0,
5333    };
5334    if (shift < 0)
5335       v <<= -shift;
5336    else
5337       v >>= shift;
5338    STBI_ASSERT(v < 256);
5339    v >>= (8-bits);
5340    STBI_ASSERT(bits >= 0 && bits <= 8);
5341    return (int) ((unsigned) v * mul_table[bits]) >> shift_table[bits];
5342 }
5343
5344 typedef struct
5345 {
5346    int bpp, offset, hsz;
5347    unsigned int mr,mg,mb,ma, all_a;
5348    int extra_read;
5349 } stbi__bmp_data;
5350
5351 static int stbi__bmp_set_mask_defaults(stbi__bmp_data *info, int compress)
5352 {
5353    // BI_BITFIELDS specifies masks explicitly, don't override
5354    if (compress == 3)
5355       return 1;
5356
5357    if (compress == 0) {
5358       if (info->bpp == 16) {
5359          info->mr = 31u << 10;
5360          info->mg = 31u <<  5;
5361          info->mb = 31u <<  0;
5362       } else if (info->bpp == 32) {
5363          info->mr = 0xffu << 16;
5364          info->mg = 0xffu <<  8;
5365          info->mb = 0xffu <<  0;
5366          info->ma = 0xffu << 24;
5367          info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
5368       } else {
5369          // otherwise, use defaults, which is all-0
5370          info->mr = info->mg = info->mb = info->ma = 0;
5371       }
5372       return 1;
5373    }
5374    return 0; // error
5375 }
5376
5377 static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
5378 {
5379    int hsz;
5380    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
5381    stbi__get32le(s); // discard filesize
5382    stbi__get16le(s); // discard reserved
5383    stbi__get16le(s); // discard reserved
5384    info->offset = stbi__get32le(s);
5385    info->hsz = hsz = stbi__get32le(s);
5386    info->mr = info->mg = info->mb = info->ma = 0;
5387    info->extra_read = 14;
5388
5389    if (info->offset < 0) return stbi__errpuc("bad BMP", "bad BMP");
5390
5391    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
5392    if (hsz == 12) {
5393       s->img_x = stbi__get16le(s);
5394       s->img_y = stbi__get16le(s);
5395    } else {
5396       s->img_x = stbi__get32le(s);
5397       s->img_y = stbi__get32le(s);
5398    }
5399    if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
5400    info->bpp = stbi__get16le(s);
5401    if (hsz != 12) {
5402       int compress = stbi__get32le(s);
5403       if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
5404       if (compress >= 4) return stbi__errpuc("BMP JPEG/PNG", "BMP type not supported: unsupported compression"); // this includes PNG/JPEG modes
5405       if (compress == 3 && info->bpp != 16 && info->bpp != 32) return stbi__errpuc("bad BMP", "bad BMP"); // bitfields requires 16 or 32 bits/pixel
5406       stbi__get32le(s); // discard sizeof
5407       stbi__get32le(s); // discard hres
5408       stbi__get32le(s); // discard vres
5409       stbi__get32le(s); // discard colorsused
5410       stbi__get32le(s); // discard max important
5411       if (hsz == 40 || hsz == 56) {
5412          if (hsz == 56) {
5413             stbi__get32le(s);
5414             stbi__get32le(s);
5415             stbi__get32le(s);
5416             stbi__get32le(s);
5417          }
5418          if (info->bpp == 16 || info->bpp == 32) {
5419             if (compress == 0) {
5420                stbi__bmp_set_mask_defaults(info, compress);
5421             } else if (compress == 3) {
5422                info->mr = stbi__get32le(s);
5423                info->mg = stbi__get32le(s);
5424                info->mb = stbi__get32le(s);
5425                info->extra_read += 12;
5426                // not documented, but generated by photoshop and handled by mspaint
5427                if (info->mr == info->mg && info->mg == info->mb) {
5428                   // ?!?!?
5429                   return stbi__errpuc("bad BMP", "bad BMP");
5430                }
5431             } else
5432                return stbi__errpuc("bad BMP", "bad BMP");
5433          }
5434       } else {
5435          // V4/V5 header
5436          int i;
5437          if (hsz != 108 && hsz != 124)
5438             return stbi__errpuc("bad BMP", "bad BMP");
5439          info->mr = stbi__get32le(s);
5440          info->mg = stbi__get32le(s);
5441          info->mb = stbi__get32le(s);
5442          info->ma = stbi__get32le(s);
5443          if (compress != 3) // override mr/mg/mb unless in BI_BITFIELDS mode, as per docs
5444             stbi__bmp_set_mask_defaults(info, compress);
5445          stbi__get32le(s); // discard color space
5446          for (i=0; i < 12; ++i)
5447             stbi__get32le(s); // discard color space parameters
5448          if (hsz == 124) {
5449             stbi__get32le(s); // discard rendering intent
5450             stbi__get32le(s); // discard offset of profile data
5451             stbi__get32le(s); // discard size of profile data
5452             stbi__get32le(s); // discard reserved
5453          }
5454       }
5455    }
5456    return (void *) 1;
5457 }
5458
5459
5460 static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5461 {
5462    stbi_uc *out;
5463    unsigned int mr=0,mg=0,mb=0,ma=0, all_a;
5464    stbi_uc pal[256][4];
5465    int psize=0,i,j,width;
5466    int flip_vertically, pad, target;
5467    stbi__bmp_data info;
5468    STBI_NOTUSED(ri);
5469
5470    info.all_a = 255;
5471    if (stbi__bmp_parse_header(s, &info) == NULL)
5472       return NULL; // error code already set
5473
5474    flip_vertically = ((int) s->img_y) > 0;
5475    s->img_y = abs((int) s->img_y);
5476
5477    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5478    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5479
5480    mr = info.mr;
5481    mg = info.mg;
5482    mb = info.mb;
5483    ma = info.ma;
5484    all_a = info.all_a;
5485
5486    if (info.hsz == 12) {
5487       if (info.bpp < 24)
5488          psize = (info.offset - info.extra_read - 24) / 3;
5489    } else {
5490       if (info.bpp < 16)
5491          psize = (info.offset - info.extra_read - info.hsz) >> 2;
5492    }
5493    if (psize == 0) {
5494       if (info.offset != s->callback_already_read + (s->img_buffer - s->img_buffer_original)) {
5495         return stbi__errpuc("bad offset", "Corrupt BMP");
5496       }
5497    }
5498
5499    if (info.bpp == 24 && ma == 0xff000000)
5500       s->img_n = 3;
5501    else
5502       s->img_n = ma ? 4 : 3;
5503    if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
5504       target = req_comp;
5505    else
5506       target = s->img_n; // if they want monochrome, we'll post-convert
5507
5508    // sanity-check size
5509    if (!stbi__mad3sizes_valid(target, s->img_x, s->img_y, 0))
5510       return stbi__errpuc("too large", "Corrupt BMP");
5511
5512    out = (stbi_uc *) stbi__malloc_mad3(target, s->img_x, s->img_y, 0);
5513    if (!out) return stbi__errpuc("outofmem", "Out of memory");
5514    if (info.bpp < 16) {
5515       int z=0;
5516       if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
5517       for (i=0; i < psize; ++i) {
5518          pal[i][2] = stbi__get8(s);
5519          pal[i][1] = stbi__get8(s);
5520          pal[i][0] = stbi__get8(s);
5521          if (info.hsz != 12) stbi__get8(s);
5522          pal[i][3] = 255;
5523       }
5524       stbi__skip(s, info.offset - info.extra_read - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
5525       if (info.bpp == 1) width = (s->img_x + 7) >> 3;
5526       else if (info.bpp == 4) width = (s->img_x + 1) >> 1;
5527       else if (info.bpp == 8) width = s->img_x;
5528       else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
5529       pad = (-width)&3;
5530       if (info.bpp == 1) {
5531          for (j=0; j < (int) s->img_y; ++j) {
5532             int bit_offset = 7, v = stbi__get8(s);
5533             for (i=0; i < (int) s->img_x; ++i) {
5534                int color = (v>>bit_offset)&0x1;
5535                out[z++] = pal[color][0];
5536                out[z++] = pal[color][1];
5537                out[z++] = pal[color][2];
5538                if (target == 4) out[z++] = 255;
5539                if (i+1 == (int) s->img_x) break;
5540                if((--bit_offset) < 0) {
5541                   bit_offset = 7;
5542                   v = stbi__get8(s);
5543                }
5544             }
5545             stbi__skip(s, pad);
5546          }
5547       } else {
5548          for (j=0; j < (int) s->img_y; ++j) {
5549             for (i=0; i < (int) s->img_x; i += 2) {
5550                int v=stbi__get8(s),v2=0;
5551                if (info.bpp == 4) {
5552                   v2 = v & 15;
5553                   v >>= 4;
5554                }
5555                out[z++] = pal[v][0];
5556                out[z++] = pal[v][1];
5557                out[z++] = pal[v][2];
5558                if (target == 4) out[z++] = 255;
5559                if (i+1 == (int) s->img_x) break;
5560                v = (info.bpp == 8) ? stbi__get8(s) : v2;
5561                out[z++] = pal[v][0];
5562                out[z++] = pal[v][1];
5563                out[z++] = pal[v][2];
5564                if (target == 4) out[z++] = 255;
5565             }
5566             stbi__skip(s, pad);
5567          }
5568       }
5569    } else {
5570       int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
5571       int z = 0;
5572       int easy=0;
5573       stbi__skip(s, info.offset - info.extra_read - info.hsz);
5574       if (info.bpp == 24) width = 3 * s->img_x;
5575       else if (info.bpp == 16) width = 2*s->img_x;
5576       else /* bpp = 32 and pad = 0 */ width=0;
5577       pad = (-width) & 3;
5578       if (info.bpp == 24) {
5579          easy = 1;
5580       } else if (info.bpp == 32) {
5581          if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
5582             easy = 2;
5583       }
5584       if (!easy) {
5585          if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
5586          // right shift amt to put high bit in position #7
5587          rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
5588          gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
5589          bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
5590          ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
5591          if (rcount > 8 || gcount > 8 || bcount > 8 || acount > 8) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
5592       }
5593       for (j=0; j < (int) s->img_y; ++j) {
5594          if (easy) {
5595             for (i=0; i < (int) s->img_x; ++i) {
5596                unsigned char a;
5597                out[z+2] = stbi__get8(s);
5598                out[z+1] = stbi__get8(s);
5599                out[z+0] = stbi__get8(s);
5600                z += 3;
5601                a = (easy == 2 ? stbi__get8(s) : 255);
5602                all_a |= a;
5603                if (target == 4) out[z++] = a;
5604             }
5605          } else {
5606             int bpp = info.bpp;
5607             for (i=0; i < (int) s->img_x; ++i) {
5608                stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
5609                unsigned int a;
5610                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
5611                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
5612                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
5613                a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
5614                all_a |= a;
5615                if (target == 4) out[z++] = STBI__BYTECAST(a);
5616             }
5617          }
5618          stbi__skip(s, pad);
5619       }
5620    }
5621
5622    // if alpha channel is all 0s, replace with all 255s
5623    if (target == 4 && all_a == 0)
5624       for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
5625          out[i] = 255;
5626
5627    if (flip_vertically) {
5628       stbi_uc t;
5629       for (j=0; j < (int) s->img_y>>1; ++j) {
5630          stbi_uc *p1 = out +      j     *s->img_x*target;
5631          stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
5632          for (i=0; i < (int) s->img_x*target; ++i) {
5633             t = p1[i]; p1[i] = p2[i]; p2[i] = t;
5634          }
5635       }
5636    }
5637
5638    if (req_comp && req_comp != target) {
5639       out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
5640       if (out == NULL) return out; // stbi__convert_format frees input on failure
5641    }
5642
5643    *x = s->img_x;
5644    *y = s->img_y;
5645    if (comp) *comp = s->img_n;
5646    return out;
5647 }
5648 #endif
5649
5650 // Targa Truevision - TGA
5651 // by Jonathan Dummer
5652 #ifndef STBI_NO_TGA
5653 // returns STBI_rgb or whatever, 0 on error
5654 static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16)
5655 {
5656    // only RGB or RGBA (incl. 16bit) or grey allowed
5657    if (is_rgb16) *is_rgb16 = 0;
5658    switch(bits_per_pixel) {
5659       case 8:  return STBI_grey;
5660       case 16: if(is_grey) return STBI_grey_alpha;
5661                // fallthrough
5662       case 15: if(is_rgb16) *is_rgb16 = 1;
5663                return STBI_rgb;
5664       case 24: // fallthrough
5665       case 32: return bits_per_pixel/8;
5666       default: return 0;
5667    }
5668 }
5669
5670 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
5671 {
5672     int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
5673     int sz, tga_colormap_type;
5674     stbi__get8(s);                   // discard Offset
5675     tga_colormap_type = stbi__get8(s); // colormap type
5676     if( tga_colormap_type > 1 ) {
5677         stbi__rewind(s);
5678         return 0;      // only RGB or indexed allowed
5679     }
5680     tga_image_type = stbi__get8(s); // image type
5681     if ( tga_colormap_type == 1 ) { // colormapped (paletted) image
5682         if (tga_image_type != 1 && tga_image_type != 9) {
5683             stbi__rewind(s);
5684             return 0;
5685         }
5686         stbi__skip(s,4);       // skip index of first colormap entry and number of entries
5687         sz = stbi__get8(s);    //   check bits per palette color entry
5688         if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) {
5689             stbi__rewind(s);
5690             return 0;
5691         }
5692         stbi__skip(s,4);       // skip image x and y origin
5693         tga_colormap_bpp = sz;
5694     } else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
5695         if ( (tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10) && (tga_image_type != 11) ) {
5696             stbi__rewind(s);
5697             return 0; // only RGB or grey allowed, +/- RLE
5698         }
5699         stbi__skip(s,9); // skip colormap specification and image x/y origin
5700         tga_colormap_bpp = 0;
5701     }
5702     tga_w = stbi__get16le(s);
5703     if( tga_w < 1 ) {
5704         stbi__rewind(s);
5705         return 0;   // test width
5706     }
5707     tga_h = stbi__get16le(s);
5708     if( tga_h < 1 ) {
5709         stbi__rewind(s);
5710         return 0;   // test height
5711     }
5712     tga_bits_per_pixel = stbi__get8(s); // bits per pixel
5713     stbi__get8(s); // ignore alpha bits
5714     if (tga_colormap_bpp != 0) {
5715         if((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
5716             // when using a colormap, tga_bits_per_pixel is the size of the indexes
5717             // I don't think anything but 8 or 16bit indexes makes sense
5718             stbi__rewind(s);
5719             return 0;
5720         }
5721         tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
5722     } else {
5723         tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
5724     }
5725     if(!tga_comp) {
5726       stbi__rewind(s);
5727       return 0;
5728     }
5729     if (x) *x = tga_w;
5730     if (y) *y = tga_h;
5731     if (comp) *comp = tga_comp;
5732     return 1;                   // seems to have passed everything
5733 }
5734
5735 static int stbi__tga_test(stbi__context *s)
5736 {
5737    int res = 0;
5738    int sz, tga_color_type;
5739    stbi__get8(s);      //   discard Offset
5740    tga_color_type = stbi__get8(s);   //   color type
5741    if ( tga_color_type > 1 ) goto errorEnd;   //   only RGB or indexed allowed
5742    sz = stbi__get8(s);   //   image type
5743    if ( tga_color_type == 1 ) { // colormapped (paletted) image
5744       if (sz != 1 && sz != 9) goto errorEnd; // colortype 1 demands image type 1 or 9
5745       stbi__skip(s,4);       // skip index of first colormap entry and number of entries
5746       sz = stbi__get8(s);    //   check bits per palette color entry
5747       if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
5748       stbi__skip(s,4);       // skip image x and y origin
5749    } else { // "normal" image w/o colormap
5750       if ( (sz != 2) && (sz != 3) && (sz != 10) && (sz != 11) ) goto errorEnd; // only RGB or grey allowed, +/- RLE
5751       stbi__skip(s,9); // skip colormap specification and image x/y origin
5752    }
5753    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test width
5754    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test height
5755    sz = stbi__get8(s);   //   bits per pixel
5756    if ( (tga_color_type == 1) && (sz != 8) && (sz != 16) ) goto errorEnd; // for colormapped images, bpp is size of an index
5757    if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
5758
5759    res = 1; // if we got this far, everything's good and we can return 1 instead of 0
5760
5761 errorEnd:
5762    stbi__rewind(s);
5763    return res;
5764 }
5765
5766 // read 16bit value and convert to 24bit RGB
5767 static void stbi__tga_read_rgb16(stbi__context *s, stbi_uc* out)
5768 {
5769    stbi__uint16 px = (stbi__uint16)stbi__get16le(s);
5770    stbi__uint16 fiveBitMask = 31;
5771    // we have 3 channels with 5bits each
5772    int r = (px >> 10) & fiveBitMask;
5773    int g = (px >> 5) & fiveBitMask;
5774    int b = px & fiveBitMask;
5775    // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
5776    out[0] = (stbi_uc)((r * 255)/31);
5777    out[1] = (stbi_uc)((g * 255)/31);
5778    out[2] = (stbi_uc)((b * 255)/31);
5779
5780    // some people claim that the most significant bit might be used for alpha
5781    // (possibly if an alpha-bit is set in the "image descriptor byte")
5782    // but that only made 16bit test images completely translucent..
5783    // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
5784 }
5785
5786 static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5787 {
5788    //   read in the TGA header stuff
5789    int tga_offset = stbi__get8(s);
5790    int tga_indexed = stbi__get8(s);
5791    int tga_image_type = stbi__get8(s);
5792    int tga_is_RLE = 0;
5793    int tga_palette_start = stbi__get16le(s);
5794    int tga_palette_len = stbi__get16le(s);
5795    int tga_palette_bits = stbi__get8(s);
5796    int tga_x_origin = stbi__get16le(s);
5797    int tga_y_origin = stbi__get16le(s);
5798    int tga_width = stbi__get16le(s);
5799    int tga_height = stbi__get16le(s);
5800    int tga_bits_per_pixel = stbi__get8(s);
5801    int tga_comp, tga_rgb16=0;
5802    int tga_inverted = stbi__get8(s);
5803    // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
5804    //   image data
5805    unsigned char *tga_data;
5806    unsigned char *tga_palette = NULL;
5807    int i, j;
5808    unsigned char raw_data[4] = {0};
5809    int RLE_count = 0;
5810    int RLE_repeating = 0;
5811    int read_next_pixel = 1;
5812    STBI_NOTUSED(ri);
5813    STBI_NOTUSED(tga_x_origin); // @TODO
5814    STBI_NOTUSED(tga_y_origin); // @TODO
5815
5816    if (tga_height > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5817    if (tga_width > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5818
5819    //   do a tiny bit of precessing
5820    if ( tga_image_type >= 8 )
5821    {
5822       tga_image_type -= 8;
5823       tga_is_RLE = 1;
5824    }
5825    tga_inverted = 1 - ((tga_inverted >> 5) & 1);
5826
5827    //   If I'm paletted, then I'll use the number of bits from the palette
5828    if ( tga_indexed ) tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
5829    else tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
5830
5831    if(!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
5832       return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
5833
5834    //   tga info
5835    *x = tga_width;
5836    *y = tga_height;
5837    if (comp) *comp = tga_comp;
5838
5839    if (!stbi__mad3sizes_valid(tga_width, tga_height, tga_comp, 0))
5840       return stbi__errpuc("too large", "Corrupt TGA");
5841
5842    tga_data = (unsigned char*)stbi__malloc_mad3(tga_width, tga_height, tga_comp, 0);
5843    if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
5844
5845    // skip to the data's starting position (offset usually = 0)
5846    stbi__skip(s, tga_offset );
5847
5848    if ( !tga_indexed && !tga_is_RLE && !tga_rgb16 ) {
5849       for (i=0; i < tga_height; ++i) {
5850          int row = tga_inverted ? tga_height -i - 1 : i;
5851          stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
5852          stbi__getn(s, tga_row, tga_width * tga_comp);
5853       }
5854    } else  {
5855       //   do I need to load a palette?
5856       if ( tga_indexed)
5857       {
5858          if (tga_palette_len == 0) {  /* you have to have at least one entry! */
5859             STBI_FREE(tga_data);
5860             return stbi__errpuc("bad palette", "Corrupt TGA");
5861          }
5862
5863          //   any data to skip? (offset usually = 0)
5864          stbi__skip(s, tga_palette_start );
5865          //   load the palette
5866          tga_palette = (unsigned char*)stbi__malloc_mad2(tga_palette_len, tga_comp, 0);
5867          if (!tga_palette) {
5868             STBI_FREE(tga_data);
5869             return stbi__errpuc("outofmem", "Out of memory");
5870          }
5871          if (tga_rgb16) {
5872             stbi_uc *pal_entry = tga_palette;
5873             STBI_ASSERT(tga_comp == STBI_rgb);
5874             for (i=0; i < tga_palette_len; ++i) {
5875                stbi__tga_read_rgb16(s, pal_entry);
5876                pal_entry += tga_comp;
5877             }
5878          } else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
5879                STBI_FREE(tga_data);
5880                STBI_FREE(tga_palette);
5881                return stbi__errpuc("bad palette", "Corrupt TGA");
5882          }
5883       }
5884       //   load the data
5885       for (i=0; i < tga_width * tga_height; ++i)
5886       {
5887          //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
5888          if ( tga_is_RLE )
5889          {
5890             if ( RLE_count == 0 )
5891             {
5892                //   yep, get the next byte as a RLE command
5893                int RLE_cmd = stbi__get8(s);
5894                RLE_count = 1 + (RLE_cmd & 127);
5895                RLE_repeating = RLE_cmd >> 7;
5896                read_next_pixel = 1;
5897             } else if ( !RLE_repeating )
5898             {
5899                read_next_pixel = 1;
5900             }
5901          } else
5902          {
5903             read_next_pixel = 1;
5904          }
5905          //   OK, if I need to read a pixel, do it now
5906          if ( read_next_pixel )
5907          {
5908             //   load however much data we did have
5909             if ( tga_indexed )
5910             {
5911                // read in index, then perform the lookup
5912                int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
5913                if ( pal_idx >= tga_palette_len ) {
5914                   // invalid index
5915                   pal_idx = 0;
5916                }
5917                pal_idx *= tga_comp;
5918                for (j = 0; j < tga_comp; ++j) {
5919                   raw_data[j] = tga_palette[pal_idx+j];
5920                }
5921             } else if(tga_rgb16) {
5922                STBI_ASSERT(tga_comp == STBI_rgb);
5923                stbi__tga_read_rgb16(s, raw_data);
5924             } else {
5925                //   read in the data raw
5926                for (j = 0; j < tga_comp; ++j) {
5927                   raw_data[j] = stbi__get8(s);
5928                }
5929             }
5930             //   clear the reading flag for the next pixel
5931             read_next_pixel = 0;
5932          } // end of reading a pixel
5933
5934          // copy data
5935          for (j = 0; j < tga_comp; ++j)
5936            tga_data[i*tga_comp+j] = raw_data[j];
5937
5938          //   in case we're in RLE mode, keep counting down
5939          --RLE_count;
5940       }
5941       //   do I need to invert the image?
5942       if ( tga_inverted )
5943       {
5944          for (j = 0; j*2 < tga_height; ++j)
5945          {
5946             int index1 = j * tga_width * tga_comp;
5947             int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
5948             for (i = tga_width * tga_comp; i > 0; --i)
5949             {
5950                unsigned char temp = tga_data[index1];
5951                tga_data[index1] = tga_data[index2];
5952                tga_data[index2] = temp;
5953                ++index1;
5954                ++index2;
5955             }
5956          }
5957       }
5958       //   clear my palette, if I had one
5959       if ( tga_palette != NULL )
5960       {
5961          STBI_FREE( tga_palette );
5962       }
5963    }
5964
5965    // swap RGB - if the source data was RGB16, it already is in the right order
5966    if (tga_comp >= 3 && !tga_rgb16)
5967    {
5968       unsigned char* tga_pixel = tga_data;
5969       for (i=0; i < tga_width * tga_height; ++i)
5970       {
5971          unsigned char temp = tga_pixel[0];
5972          tga_pixel[0] = tga_pixel[2];
5973          tga_pixel[2] = temp;
5974          tga_pixel += tga_comp;
5975       }
5976    }
5977
5978    // convert to target component count
5979    if (req_comp && req_comp != tga_comp)
5980       tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
5981
5982    //   the things I do to get rid of an error message, and yet keep
5983    //   Microsoft's C compilers happy... [8^(
5984    tga_palette_start = tga_palette_len = tga_palette_bits =
5985          tga_x_origin = tga_y_origin = 0;
5986    STBI_NOTUSED(tga_palette_start);
5987    //   OK, done
5988    return tga_data;
5989 }
5990 #endif
5991
5992 // *************************************************************************************************
5993 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
5994
5995 #ifndef STBI_NO_PSD
5996 static int stbi__psd_test(stbi__context *s)
5997 {
5998    int r = (stbi__get32be(s) == 0x38425053);
5999    stbi__rewind(s);
6000    return r;
6001 }
6002
6003 static int stbi__psd_decode_rle(stbi__context *s, stbi_uc *p, int pixelCount)
6004 {
6005    int count, nleft, len;
6006
6007    count = 0;
6008    while ((nleft = pixelCount - count) > 0) {
6009       len = stbi__get8(s);
6010       if (len == 128) {
6011          // No-op.
6012       } else if (len < 128) {
6013          // Copy next len+1 bytes literally.
6014          len++;
6015          if (len > nleft) return 0; // corrupt data
6016          count += len;
6017          while (len) {
6018             *p = stbi__get8(s);
6019             p += 4;
6020             len--;
6021          }
6022       } else if (len > 128) {
6023          stbi_uc   val;
6024          // Next -len+1 bytes in the dest are replicated from next source byte.
6025          // (Interpret len as a negative 8-bit int.)
6026          len = 257 - len;
6027          if (len > nleft) return 0; // corrupt data
6028          val = stbi__get8(s);
6029          count += len;
6030          while (len) {
6031             *p = val;
6032             p += 4;
6033             len--;
6034          }
6035       }
6036    }
6037
6038    return 1;
6039 }
6040
6041 static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
6042 {
6043    int pixelCount;
6044    int channelCount, compression;
6045    int channel, i;
6046    int bitdepth;
6047    int w,h;
6048    stbi_uc *out;
6049    STBI_NOTUSED(ri);
6050
6051    // Check identifier
6052    if (stbi__get32be(s) != 0x38425053)   // "8BPS"
6053       return stbi__errpuc("not PSD", "Corrupt PSD image");
6054
6055    // Check file type version.
6056    if (stbi__get16be(s) != 1)
6057       return stbi__errpuc("wrong version", "Unsupported version of PSD image");
6058
6059    // Skip 6 reserved bytes.
6060    stbi__skip(s, 6 );
6061
6062    // Read the number of channels (R, G, B, A, etc).
6063    channelCount = stbi__get16be(s);
6064    if (channelCount < 0 || channelCount > 16)
6065       return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
6066
6067    // Read the rows and columns of the image.
6068    h = stbi__get32be(s);
6069    w = stbi__get32be(s);
6070
6071    if (h > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6072    if (w > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6073
6074    // Make sure the depth is 8 bits.
6075    bitdepth = stbi__get16be(s);
6076    if (bitdepth != 8 && bitdepth != 16)
6077       return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
6078
6079    // Make sure the color mode is RGB.
6080    // Valid options are:
6081    //   0: Bitmap
6082    //   1: Grayscale
6083    //   2: Indexed color
6084    //   3: RGB color
6085    //   4: CMYK color
6086    //   7: Multichannel
6087    //   8: Duotone
6088    //   9: Lab color
6089    if (stbi__get16be(s) != 3)
6090       return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
6091
6092    // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
6093    stbi__skip(s,stbi__get32be(s) );
6094
6095    // Skip the image resources.  (resolution, pen tool paths, etc)
6096    stbi__skip(s, stbi__get32be(s) );
6097
6098    // Skip the reserved data.
6099    stbi__skip(s, stbi__get32be(s) );
6100
6101    // Find out if the data is compressed.
6102    // Known values:
6103    //   0: no compression
6104    //   1: RLE compressed
6105    compression = stbi__get16be(s);
6106    if (compression > 1)
6107       return stbi__errpuc("bad compression", "PSD has an unknown compression format");
6108
6109    // Check size
6110    if (!stbi__mad3sizes_valid(4, w, h, 0))
6111       return stbi__errpuc("too large", "Corrupt PSD");
6112
6113    // Create the destination image.
6114
6115    if (!compression && bitdepth == 16 && bpc == 16) {
6116       out = (stbi_uc *) stbi__malloc_mad3(8, w, h, 0);
6117       ri->bits_per_channel = 16;
6118    } else
6119       out = (stbi_uc *) stbi__malloc(4 * w*h);
6120
6121    if (!out) return stbi__errpuc("outofmem", "Out of memory");
6122    pixelCount = w*h;
6123
6124    // Initialize the data to zero.
6125    //memset( out, 0, pixelCount * 4 );
6126
6127    // Finally, the image data.
6128    if (compression) {
6129       // RLE as used by .PSD and .TIFF
6130       // Loop until you get the number of unpacked bytes you are expecting:
6131       //     Read the next source byte into n.
6132       //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
6133       //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
6134       //     Else if n is 128, noop.
6135       // Endloop
6136
6137       // The RLE-compressed data is preceded by a 2-byte data count for each row in the data,
6138       // which we're going to just skip.
6139       stbi__skip(s, h * channelCount * 2 );
6140
6141       // Read the RLE data by channel.
6142       for (channel = 0; channel < 4; channel++) {
6143          stbi_uc *p;
6144
6145          p = out+channel;
6146          if (channel >= channelCount) {
6147             // Fill this channel with default data.
6148             for (i = 0; i < pixelCount; i++, p += 4)
6149                *p = (channel == 3 ? 255 : 0);
6150          } else {
6151             // Read the RLE data.
6152             if (!stbi__psd_decode_rle(s, p, pixelCount)) {
6153                STBI_FREE(out);
6154                return stbi__errpuc("corrupt", "bad RLE data");
6155             }
6156          }
6157       }
6158
6159    } else {
6160       // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
6161       // where each channel consists of an 8-bit (or 16-bit) value for each pixel in the image.
6162
6163       // Read the data by channel.
6164       for (channel = 0; channel < 4; channel++) {
6165          if (channel >= channelCount) {
6166             // Fill this channel with default data.
6167             if (bitdepth == 16 && bpc == 16) {
6168                stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
6169                stbi__uint16 val = channel == 3 ? 65535 : 0;
6170                for (i = 0; i < pixelCount; i++, q += 4)
6171                   *q = val;
6172             } else {
6173                stbi_uc *p = out+channel;
6174                stbi_uc val = channel == 3 ? 255 : 0;
6175                for (i = 0; i < pixelCount; i++, p += 4)
6176                   *p = val;
6177             }
6178          } else {
6179             if (ri->bits_per_channel == 16) {    // output bpc
6180                stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
6181                for (i = 0; i < pixelCount; i++, q += 4)
6182                   *q = (stbi__uint16) stbi__get16be(s);
6183             } else {
6184                stbi_uc *p = out+channel;
6185                if (bitdepth == 16) {  // input bpc
6186                   for (i = 0; i < pixelCount; i++, p += 4)
6187                      *p = (stbi_uc) (stbi__get16be(s) >> 8);
6188                } else {
6189                   for (i = 0; i < pixelCount; i++, p += 4)
6190                      *p = stbi__get8(s);
6191                }
6192             }
6193          }
6194       }
6195    }
6196
6197    // remove weird white matte from PSD
6198    if (channelCount >= 4) {
6199       if (ri->bits_per_channel == 16) {
6200          for (i=0; i < w*h; ++i) {
6201             stbi__uint16 *pixel = (stbi__uint16 *) out + 4*i;
6202             if (pixel[3] != 0 && pixel[3] != 65535) {
6203                float a = pixel[3] / 65535.0f;
6204                float ra = 1.0f / a;
6205                float inv_a = 65535.0f * (1 - ra);
6206                pixel[0] = (stbi__uint16) (pixel[0]*ra + inv_a);
6207                pixel[1] = (stbi__uint16) (pixel[1]*ra + inv_a);
6208                pixel[2] = (stbi__uint16) (pixel[2]*ra + inv_a);
6209             }
6210          }
6211       } else {
6212          for (i=0; i < w*h; ++i) {
6213             unsigned char *pixel = out + 4*i;
6214             if (pixel[3] != 0 && pixel[3] != 255) {
6215                float a = pixel[3] / 255.0f;
6216                float ra = 1.0f / a;
6217                float inv_a = 255.0f * (1 - ra);
6218                pixel[0] = (unsigned char) (pixel[0]*ra + inv_a);
6219                pixel[1] = (unsigned char) (pixel[1]*ra + inv_a);
6220                pixel[2] = (unsigned char) (pixel[2]*ra + inv_a);
6221             }
6222          }
6223       }
6224    }
6225
6226    // convert to desired output format
6227    if (req_comp && req_comp != 4) {
6228       if (ri->bits_per_channel == 16)
6229          out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, 4, req_comp, w, h);
6230       else
6231          out = stbi__convert_format(out, 4, req_comp, w, h);
6232       if (out == NULL) return out; // stbi__convert_format frees input on failure
6233    }
6234
6235    if (comp) *comp = 4;
6236    *y = h;
6237    *x = w;
6238
6239    return out;
6240 }
6241 #endif
6242
6243 // *************************************************************************************************
6244 // Softimage PIC loader
6245 // by Tom Seddon
6246 //
6247 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
6248 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
6249
6250 #ifndef STBI_NO_PIC
6251 static int stbi__pic_is4(stbi__context *s,const char *str)
6252 {
6253    int i;
6254    for (i=0; i<4; ++i)
6255       if (stbi__get8(s) != (stbi_uc)str[i])
6256          return 0;
6257
6258    return 1;
6259 }
6260
6261 static int stbi__pic_test_core(stbi__context *s)
6262 {
6263    int i;
6264
6265    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
6266       return 0;
6267
6268    for(i=0;i<84;++i)
6269       stbi__get8(s);
6270
6271    if (!stbi__pic_is4(s,"PICT"))
6272       return 0;
6273
6274    return 1;
6275 }
6276
6277 typedef struct
6278 {
6279    stbi_uc size,type,channel;
6280 } stbi__pic_packet;
6281
6282 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
6283 {
6284    int mask=0x80, i;
6285
6286    for (i=0; i<4; ++i, mask>>=1) {
6287       if (channel & mask) {
6288          if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
6289          dest[i]=stbi__get8(s);
6290       }
6291    }
6292
6293    return dest;
6294 }
6295
6296 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
6297 {
6298    int mask=0x80,i;
6299
6300    for (i=0;i<4; ++i, mask>>=1)
6301       if (channel&mask)
6302          dest[i]=src[i];
6303 }
6304
6305 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
6306 {
6307    int act_comp=0,num_packets=0,y,chained;
6308    stbi__pic_packet packets[10];
6309
6310    // this will (should...) cater for even some bizarre stuff like having data
6311     // for the same channel in multiple packets.
6312    do {
6313       stbi__pic_packet *packet;
6314
6315       if (num_packets==sizeof(packets)/sizeof(packets[0]))
6316          return stbi__errpuc("bad format","too many packets");
6317
6318       packet = &packets[num_packets++];
6319
6320       chained = stbi__get8(s);
6321       packet->size    = stbi__get8(s);
6322       packet->type    = stbi__get8(s);
6323       packet->channel = stbi__get8(s);
6324
6325       act_comp |= packet->channel;
6326
6327       if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
6328       if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
6329    } while (chained);
6330
6331    *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
6332
6333    for(y=0; y<height; ++y) {
6334       int packet_idx;
6335
6336       for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
6337          stbi__pic_packet *packet = &packets[packet_idx];
6338          stbi_uc *dest = result+y*width*4;
6339
6340          switch (packet->type) {
6341             default:
6342                return stbi__errpuc("bad format","packet has bad compression type");
6343
6344             case 0: {//uncompressed
6345                int x;
6346
6347                for(x=0;x<width;++x, dest+=4)
6348                   if (!stbi__readval(s,packet->channel,dest))
6349                      return 0;
6350                break;
6351             }
6352
6353             case 1://Pure RLE
6354                {
6355                   int left=width, i;
6356
6357                   while (left>0) {
6358                      stbi_uc count,value[4];
6359
6360                      count=stbi__get8(s);
6361                      if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
6362
6363                      if (count > left)
6364                         count = (stbi_uc) left;
6365
6366                      if (!stbi__readval(s,packet->channel,value))  return 0;
6367
6368                      for(i=0; i<count; ++i,dest+=4)
6369                         stbi__copyval(packet->channel,dest,value);
6370                      left -= count;
6371                   }
6372                }
6373                break;
6374
6375             case 2: {//Mixed RLE
6376                int left=width;
6377                while (left>0) {
6378                   int count = stbi__get8(s), i;
6379                   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
6380
6381                   if (count >= 128) { // Repeated
6382                      stbi_uc value[4];
6383
6384                      if (count==128)
6385                         count = stbi__get16be(s);
6386                      else
6387                         count -= 127;
6388                      if (count > left)
6389                         return stbi__errpuc("bad file","scanline overrun");
6390
6391                      if (!stbi__readval(s,packet->channel,value))
6392                         return 0;
6393
6394                      for(i=0;i<count;++i, dest += 4)
6395                         stbi__copyval(packet->channel,dest,value);
6396                   } else { // Raw
6397                      ++count;
6398                      if (count>left) return stbi__errpuc("bad file","scanline overrun");
6399
6400                      for(i=0;i<count;++i, dest+=4)
6401                         if (!stbi__readval(s,packet->channel,dest))
6402                            return 0;
6403                   }
6404                   left-=count;
6405                }
6406                break;
6407             }
6408          }
6409       }
6410    }
6411
6412    return result;
6413 }
6414
6415 static void *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp, stbi__result_info *ri)
6416 {
6417    stbi_uc *result;
6418    int i, x,y, internal_comp;
6419    STBI_NOTUSED(ri);
6420
6421    if (!comp) comp = &internal_comp;
6422
6423    for (i=0; i<92; ++i)
6424       stbi__get8(s);
6425
6426    x = stbi__get16be(s);
6427    y = stbi__get16be(s);
6428
6429    if (y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6430    if (x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6431
6432    if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
6433    if (!stbi__mad3sizes_valid(x, y, 4, 0)) return stbi__errpuc("too large", "PIC image too large to decode");
6434
6435    stbi__get32be(s); //skip `ratio'
6436    stbi__get16be(s); //skip `fields'
6437    stbi__get16be(s); //skip `pad'
6438
6439    // intermediate buffer is RGBA
6440    result = (stbi_uc *) stbi__malloc_mad3(x, y, 4, 0);
6441    if (!result) return stbi__errpuc("outofmem", "Out of memory");
6442    memset(result, 0xff, x*y*4);
6443
6444    if (!stbi__pic_load_core(s,x,y,comp, result)) {
6445       STBI_FREE(result);
6446       result=0;
6447    }
6448    *px = x;
6449    *py = y;
6450    if (req_comp == 0) req_comp = *comp;
6451    result=stbi__convert_format(result,4,req_comp,x,y);
6452
6453    return result;
6454 }
6455
6456 static int stbi__pic_test(stbi__context *s)
6457 {
6458    int r = stbi__pic_test_core(s);
6459    stbi__rewind(s);
6460    return r;
6461 }
6462 #endif
6463
6464 // *************************************************************************************************
6465 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
6466
6467 #ifndef STBI_NO_GIF
6468 typedef struct
6469 {
6470    stbi__int16 prefix;
6471    stbi_uc first;
6472    stbi_uc suffix;
6473 } stbi__gif_lzw;
6474
6475 typedef struct
6476 {
6477    int w,h;
6478    stbi_uc *out;                 // output buffer (always 4 components)
6479    stbi_uc *background;          // The current "background" as far as a gif is concerned
6480    stbi_uc *history;
6481    int flags, bgindex, ratio, transparent, eflags;
6482    stbi_uc  pal[256][4];
6483    stbi_uc lpal[256][4];
6484    stbi__gif_lzw codes[8192];
6485    stbi_uc *color_table;
6486    int parse, step;
6487    int lflags;
6488    int start_x, start_y;
6489    int max_x, max_y;
6490    int cur_x, cur_y;
6491    int line_size;
6492    int delay;
6493 } stbi__gif;
6494
6495 static int stbi__gif_test_raw(stbi__context *s)
6496 {
6497    int sz;
6498    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
6499    sz = stbi__get8(s);
6500    if (sz != '9' && sz != '7') return 0;
6501    if (stbi__get8(s) != 'a') return 0;
6502    return 1;
6503 }
6504
6505 static int stbi__gif_test(stbi__context *s)
6506 {
6507    int r = stbi__gif_test_raw(s);
6508    stbi__rewind(s);
6509    return r;
6510 }
6511
6512 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
6513 {
6514    int i;
6515    for (i=0; i < num_entries; ++i) {
6516       pal[i][2] = stbi__get8(s);
6517       pal[i][1] = stbi__get8(s);
6518       pal[i][0] = stbi__get8(s);
6519       pal[i][3] = transp == i ? 0 : 255;
6520    }
6521 }
6522
6523 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
6524 {
6525    stbi_uc version;
6526    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
6527       return stbi__err("not GIF", "Corrupt GIF");
6528
6529    version = stbi__get8(s);
6530    if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
6531    if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
6532
6533    stbi__g_failure_reason = "";
6534    g->w = stbi__get16le(s);
6535    g->h = stbi__get16le(s);
6536    g->flags = stbi__get8(s);
6537    g->bgindex = stbi__get8(s);
6538    g->ratio = stbi__get8(s);
6539    g->transparent = -1;
6540
6541    if (g->w > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
6542    if (g->h > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
6543
6544    if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
6545
6546    if (is_info) return 1;
6547
6548    if (g->flags & 0x80)
6549       stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
6550
6551    return 1;
6552 }
6553
6554 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
6555 {
6556    stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif));
6557    if (!g) return stbi__err("outofmem", "Out of memory");
6558    if (!stbi__gif_header(s, g, comp, 1)) {
6559       STBI_FREE(g);
6560       stbi__rewind( s );
6561       return 0;
6562    }
6563    if (x) *x = g->w;
6564    if (y) *y = g->h;
6565    STBI_FREE(g);
6566    return 1;
6567 }
6568
6569 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
6570 {
6571    stbi_uc *p, *c;
6572    int idx;
6573
6574    // recurse to decode the prefixes, since the linked-list is backwards,
6575    // and working backwards through an interleaved image would be nasty
6576    if (g->codes[code].prefix >= 0)
6577       stbi__out_gif_code(g, g->codes[code].prefix);
6578
6579    if (g->cur_y >= g->max_y) return;
6580
6581    idx = g->cur_x + g->cur_y;
6582    p = &g->out[idx];
6583    g->history[idx / 4] = 1;
6584
6585    c = &g->color_table[g->codes[code].suffix * 4];
6586    if (c[3] > 128) { // don't render transparent pixels;
6587       p[0] = c[2];
6588       p[1] = c[1];
6589       p[2] = c[0];
6590       p[3] = c[3];
6591    }
6592    g->cur_x += 4;
6593
6594    if (g->cur_x >= g->max_x) {
6595       g->cur_x = g->start_x;
6596       g->cur_y += g->step;
6597
6598       while (g->cur_y >= g->max_y && g->parse > 0) {
6599          g->step = (1 << g->parse) * g->line_size;
6600          g->cur_y = g->start_y + (g->step >> 1);
6601          --g->parse;
6602       }
6603    }
6604 }
6605
6606 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
6607 {
6608    stbi_uc lzw_cs;
6609    stbi__int32 len, init_code;
6610    stbi__uint32 first;
6611    stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
6612    stbi__gif_lzw *p;
6613
6614    lzw_cs = stbi__get8(s);
6615    if (lzw_cs > 12) return NULL;
6616    clear = 1 << lzw_cs;
6617    first = 1;
6618    codesize = lzw_cs + 1;
6619    codemask = (1 << codesize) - 1;
6620    bits = 0;
6621    valid_bits = 0;
6622    for (init_code = 0; init_code < clear; init_code++) {
6623       g->codes[init_code].prefix = -1;
6624       g->codes[init_code].first = (stbi_uc) init_code;
6625       g->codes[init_code].suffix = (stbi_uc) init_code;
6626    }
6627
6628    // support no starting clear code
6629    avail = clear+2;
6630    oldcode = -1;
6631
6632    len = 0;
6633    for(;;) {
6634       if (valid_bits < codesize) {
6635          if (len == 0) {
6636             len = stbi__get8(s); // start new block
6637             if (len == 0)
6638                return g->out;
6639          }
6640          --len;
6641          bits |= (stbi__int32) stbi__get8(s) << valid_bits;
6642          valid_bits += 8;
6643       } else {
6644          stbi__int32 code = bits & codemask;
6645          bits >>= codesize;
6646          valid_bits -= codesize;
6647          // @OPTIMIZE: is there some way we can accelerate the non-clear path?
6648          if (code == clear) {  // clear code
6649             codesize = lzw_cs + 1;
6650             codemask = (1 << codesize) - 1;
6651             avail = clear + 2;
6652             oldcode = -1;
6653             first = 0;
6654          } else if (code == clear + 1) { // end of stream code
6655             stbi__skip(s, len);
6656             while ((len = stbi__get8(s)) > 0)
6657                stbi__skip(s,len);
6658             return g->out;
6659          } else if (code <= avail) {
6660             if (first) {
6661                return stbi__errpuc("no clear code", "Corrupt GIF");
6662             }
6663
6664             if (oldcode >= 0) {
6665                p = &g->codes[avail++];
6666                if (avail > 8192) {
6667                   return stbi__errpuc("too many codes", "Corrupt GIF");
6668                }
6669
6670                p->prefix = (stbi__int16) oldcode;
6671                p->first = g->codes[oldcode].first;
6672                p->suffix = (code == avail) ? p->first : g->codes[code].first;
6673             } else if (code == avail)
6674                return stbi__errpuc("illegal code in raster", "Corrupt GIF");
6675
6676             stbi__out_gif_code(g, (stbi__uint16) code);
6677
6678             if ((avail & codemask) == 0 && avail <= 0x0FFF) {
6679                codesize++;
6680                codemask = (1 << codesize) - 1;
6681             }
6682
6683             oldcode = code;
6684          } else {
6685             return stbi__errpuc("illegal code in raster", "Corrupt GIF");
6686          }
6687       }
6688    }
6689 }
6690
6691 // this function is designed to support animated gifs, although stb_image doesn't support it
6692 // two back is the image from two frames ago, used for a very specific disposal format
6693 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp, stbi_uc *two_back)
6694 {
6695    int dispose;
6696    int first_frame;
6697    int pi;
6698    int pcount;
6699    STBI_NOTUSED(req_comp);
6700
6701    // on first frame, any non-written pixels get the background colour (non-transparent)
6702    first_frame = 0;
6703    if (g->out == 0) {
6704       if (!stbi__gif_header(s, g, comp,0)) return 0; // stbi__g_failure_reason set by stbi__gif_header
6705       if (!stbi__mad3sizes_valid(4, g->w, g->h, 0))
6706          return stbi__errpuc("too large", "GIF image is too large");
6707       pcount = g->w * g->h;
6708       g->out = (stbi_uc *) stbi__malloc(4 * pcount);
6709       g->background = (stbi_uc *) stbi__malloc(4 * pcount);
6710       g->history = (stbi_uc *) stbi__malloc(pcount);
6711       if (!g->out || !g->background || !g->history)
6712          return stbi__errpuc("outofmem", "Out of memory");
6713
6714       // image is treated as "transparent" at the start - ie, nothing overwrites the current background;
6715       // background colour is only used for pixels that are not rendered first frame, after that "background"
6716       // color refers to the color that was there the previous frame.
6717       memset(g->out, 0x00, 4 * pcount);
6718       memset(g->background, 0x00, 4 * pcount); // state of the background (starts transparent)
6719       memset(g->history, 0x00, pcount);        // pixels that were affected previous frame
6720       first_frame = 1;
6721    } else {
6722       // second frame - how do we dispose of the previous one?
6723       dispose = (g->eflags & 0x1C) >> 2;
6724       pcount = g->w * g->h;
6725
6726       if ((dispose == 3) && (two_back == 0)) {
6727          dispose = 2; // if I don't have an image to revert back to, default to the old background
6728       }
6729
6730       if (dispose == 3) { // use previous graphic
6731          for (pi = 0; pi < pcount; ++pi) {
6732             if (g->history[pi]) {
6733                memcpy( &g->out[pi * 4], &two_back[pi * 4], 4 );
6734             }
6735          }
6736       } else if (dispose == 2) {
6737          // restore what was changed last frame to background before that frame;
6738          for (pi = 0; pi < pcount; ++pi) {
6739             if (g->history[pi]) {
6740                memcpy( &g->out[pi * 4], &g->background[pi * 4], 4 );
6741             }
6742          }
6743       } else {
6744          // This is a non-disposal case eithe way, so just
6745          // leave the pixels as is, and they will become the new background
6746          // 1: do not dispose
6747          // 0:  not specified.
6748       }
6749
6750       // background is what out is after the undoing of the previou frame;
6751       memcpy( g->background, g->out, 4 * g->w * g->h );
6752    }
6753
6754    // clear my history;
6755    memset( g->history, 0x00, g->w * g->h );        // pixels that were affected previous frame
6756
6757    for (;;) {
6758       int tag = stbi__get8(s);
6759       switch (tag) {
6760          case 0x2C: /* Image Descriptor */
6761          {
6762             stbi__int32 x, y, w, h;
6763             stbi_uc *o;
6764
6765             x = stbi__get16le(s);
6766             y = stbi__get16le(s);
6767             w = stbi__get16le(s);
6768             h = stbi__get16le(s);
6769             if (((x + w) > (g->w)) || ((y + h) > (g->h)))
6770                return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
6771
6772             g->line_size = g->w * 4;
6773             g->start_x = x * 4;
6774             g->start_y = y * g->line_size;
6775             g->max_x   = g->start_x + w * 4;
6776             g->max_y   = g->start_y + h * g->line_size;
6777             g->cur_x   = g->start_x;
6778             g->cur_y   = g->start_y;
6779
6780             // if the width of the specified rectangle is 0, that means
6781             // we may not see *any* pixels or the image is malformed;
6782             // to make sure this is caught, move the current y down to
6783             // max_y (which is what out_gif_code checks).
6784             if (w == 0)
6785                g->cur_y = g->max_y;
6786
6787             g->lflags = stbi__get8(s);
6788
6789             if (g->lflags & 0x40) {
6790                g->step = 8 * g->line_size; // first interlaced spacing
6791                g->parse = 3;
6792             } else {
6793                g->step = g->line_size;
6794                g->parse = 0;
6795             }
6796
6797             if (g->lflags & 0x80) {
6798                stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
6799                g->color_table = (stbi_uc *) g->lpal;
6800             } else if (g->flags & 0x80) {
6801                g->color_table = (stbi_uc *) g->pal;
6802             } else
6803                return stbi__errpuc("missing color table", "Corrupt GIF");
6804
6805             o = stbi__process_gif_raster(s, g);
6806             if (!o) return NULL;
6807
6808             // if this was the first frame,
6809             pcount = g->w * g->h;
6810             if (first_frame && (g->bgindex > 0)) {
6811                // if first frame, any pixel not drawn to gets the background color
6812                for (pi = 0; pi < pcount; ++pi) {
6813                   if (g->history[pi] == 0) {
6814                      g->pal[g->bgindex][3] = 255; // just in case it was made transparent, undo that; It will be reset next frame if need be;
6815                      memcpy( &g->out[pi * 4], &g->pal[g->bgindex], 4 );
6816                   }
6817                }
6818             }
6819
6820             return o;
6821          }
6822
6823          case 0x21: // Comment Extension.
6824          {
6825             int len;
6826             int ext = stbi__get8(s);
6827             if (ext == 0xF9) { // Graphic Control Extension.
6828                len = stbi__get8(s);
6829                if (len == 4) {
6830                   g->eflags = stbi__get8(s);
6831                   g->delay = 10 * stbi__get16le(s); // delay - 1/100th of a second, saving as 1/1000ths.
6832
6833                   // unset old transparent
6834                   if (g->transparent >= 0) {
6835                      g->pal[g->transparent][3] = 255;
6836                   }
6837                   if (g->eflags & 0x01) {
6838                      g->transparent = stbi__get8(s);
6839                      if (g->transparent >= 0) {
6840                         g->pal[g->transparent][3] = 0;
6841                      }
6842                   } else {
6843                      // don't need transparent
6844                      stbi__skip(s, 1);
6845                      g->transparent = -1;
6846                   }
6847                } else {
6848                   stbi__skip(s, len);
6849                   break;
6850                }
6851             }
6852             while ((len = stbi__get8(s)) != 0) {
6853                stbi__skip(s, len);
6854             }
6855             break;
6856          }
6857
6858          case 0x3B: // gif stream termination code
6859             return (stbi_uc *) s; // using '1' causes warning on some compilers
6860
6861          default:
6862             return stbi__errpuc("unknown code", "Corrupt GIF");
6863       }
6864    }
6865 }
6866
6867 static void *stbi__load_gif_main_outofmem(stbi__gif *g, stbi_uc *out, int **delays)
6868 {
6869    STBI_FREE(g->out);
6870    STBI_FREE(g->history);
6871    STBI_FREE(g->background);
6872
6873    if (out) STBI_FREE(out);
6874    if (delays && *delays) STBI_FREE(*delays);
6875    return stbi__errpuc("outofmem", "Out of memory");
6876 }
6877
6878 static void *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
6879 {
6880    if (stbi__gif_test(s)) {
6881       int layers = 0;
6882       stbi_uc *u = 0;
6883       stbi_uc *out = 0;
6884       stbi_uc *two_back = 0;
6885       stbi__gif g;
6886       int stride;
6887       int out_size = 0;
6888       int delays_size = 0;
6889
6890       STBI_NOTUSED(out_size);
6891       STBI_NOTUSED(delays_size);
6892
6893       memset(&g, 0, sizeof(g));
6894       if (delays) {
6895          *delays = 0;
6896       }
6897
6898       do {
6899          u = stbi__gif_load_next(s, &g, comp, req_comp, two_back);
6900          if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
6901
6902          if (u) {
6903             *x = g.w;
6904             *y = g.h;
6905             ++layers;
6906             stride = g.w * g.h * 4;
6907
6908             if (out) {
6909                void *tmp = (stbi_uc*) STBI_REALLOC_SIZED( out, out_size, layers * stride );
6910                if (!tmp)
6911                   return stbi__load_gif_main_outofmem(&g, out, delays);
6912                else {
6913                    out = (stbi_uc*) tmp;
6914                    out_size = layers * stride;
6915                }
6916
6917                if (delays) {
6918                   int *new_delays = (int*) STBI_REALLOC_SIZED( *delays, delays_size, sizeof(int) * layers );
6919                   if (!new_delays)
6920                      return stbi__load_gif_main_outofmem(&g, out, delays);
6921                   *delays = new_delays;
6922                   delays_size = layers * sizeof(int);
6923                }
6924             } else {
6925                out = (stbi_uc*)stbi__malloc( layers * stride );
6926                if (!out)
6927                   return stbi__load_gif_main_outofmem(&g, out, delays);
6928                out_size = layers * stride;
6929                if (delays) {
6930                   *delays = (int*) stbi__malloc( layers * sizeof(int) );
6931                   if (!*delays)
6932                      return stbi__load_gif_main_outofmem(&g, out, delays);
6933                   delays_size = layers * sizeof(int);
6934                }
6935             }
6936             memcpy( out + ((layers - 1) * stride), u, stride );
6937             if (layers >= 2) {
6938                two_back = out - 2 * stride;
6939             }
6940
6941             if (delays) {
6942                (*delays)[layers - 1U] = g.delay;
6943             }
6944          }
6945       } while (u != 0);
6946
6947       // free temp buffer;
6948       STBI_FREE(g.out);
6949       STBI_FREE(g.history);
6950       STBI_FREE(g.background);
6951
6952       // do the final conversion after loading everything;
6953       if (req_comp && req_comp != 4)
6954          out = stbi__convert_format(out, 4, req_comp, layers * g.w, g.h);
6955
6956       *z = layers;
6957       return out;
6958    } else {
6959       return stbi__errpuc("not GIF", "Image was not as a gif type.");
6960    }
6961 }
6962
6963 static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
6964 {
6965    stbi_uc *u = 0;
6966    stbi__gif g;
6967    memset(&g, 0, sizeof(g));
6968    STBI_NOTUSED(ri);
6969
6970    u = stbi__gif_load_next(s, &g, comp, req_comp, 0);
6971    if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
6972    if (u) {
6973       *x = g.w;
6974       *y = g.h;
6975
6976       // moved conversion to after successful load so that the same
6977       // can be done for multiple frames.
6978       if (req_comp && req_comp != 4)
6979          u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
6980    } else if (g.out) {
6981       // if there was an error and we allocated an image buffer, free it!
6982       STBI_FREE(g.out);
6983    }
6984
6985    // free buffers needed for multiple frame loading;
6986    STBI_FREE(g.history);
6987    STBI_FREE(g.background);
6988
6989    return u;
6990 }
6991
6992 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
6993 {
6994    return stbi__gif_info_raw(s,x,y,comp);
6995 }
6996 #endif
6997
6998 // *************************************************************************************************
6999 // Radiance RGBE HDR loader
7000 // originally by Nicolas Schulz
7001 #ifndef STBI_NO_HDR
7002 static int stbi__hdr_test_core(stbi__context *s, const char *signature)
7003 {
7004    int i;
7005    for (i=0; signature[i]; ++i)
7006       if (stbi__get8(s) != signature[i])
7007           return 0;
7008    stbi__rewind(s);
7009    return 1;
7010 }
7011
7012 static int stbi__hdr_test(stbi__context* s)
7013 {
7014    int r = stbi__hdr_test_core(s, "#?RADIANCE\n");
7015    stbi__rewind(s);
7016    if(!r) {
7017        r = stbi__hdr_test_core(s, "#?RGBE\n");
7018        stbi__rewind(s);
7019    }
7020    return r;
7021 }
7022
7023 #define STBI__HDR_BUFLEN  1024
7024 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
7025 {
7026    int len=0;
7027    char c = '\0';
7028
7029    c = (char) stbi__get8(z);
7030
7031    while (!stbi__at_eof(z) && c != '\n') {
7032       buffer[len++] = c;
7033       if (len == STBI__HDR_BUFLEN-1) {
7034          // flush to end of line
7035          while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
7036             ;
7037          break;
7038       }
7039       c = (char) stbi__get8(z);
7040    }
7041
7042    buffer[len] = 0;
7043    return buffer;
7044 }
7045
7046 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
7047 {
7048    if ( input[3] != 0 ) {
7049       float f1;
7050       // Exponent
7051       f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
7052       if (req_comp <= 2)
7053          output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
7054       else {
7055          output[0] = input[0] * f1;
7056          output[1] = input[1] * f1;
7057          output[2] = input[2] * f1;
7058       }
7059       if (req_comp == 2) output[1] = 1;
7060       if (req_comp == 4) output[3] = 1;
7061    } else {
7062       switch (req_comp) {
7063          case 4: output[3] = 1; /* fallthrough */
7064          case 3: output[0] = output[1] = output[2] = 0;
7065                  break;
7066          case 2: output[1] = 1; /* fallthrough */
7067          case 1: output[0] = 0;
7068                  break;
7069       }
7070    }
7071 }
7072
7073 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
7074 {
7075    char buffer[STBI__HDR_BUFLEN];
7076    char *token;
7077    int valid = 0;
7078    int width, height;
7079    stbi_uc *scanline;
7080    float *hdr_data;
7081    int len;
7082    unsigned char count, value;
7083    int i, j, k, c1,c2, z;
7084    const char *headerToken;
7085    STBI_NOTUSED(ri);
7086
7087    // Check identifier
7088    headerToken = stbi__hdr_gettoken(s,buffer);
7089    if (strcmp(headerToken, "#?RADIANCE") != 0 && strcmp(headerToken, "#?RGBE") != 0)
7090       return stbi__errpf("not HDR", "Corrupt HDR image");
7091
7092    // Parse header
7093    for(;;) {
7094       token = stbi__hdr_gettoken(s,buffer);
7095       if (token[0] == 0) break;
7096       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
7097    }
7098
7099    if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
7100
7101    // Parse width and height
7102    // can't use sscanf() if we're not using stdio!
7103    token = stbi__hdr_gettoken(s,buffer);
7104    if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
7105    token += 3;
7106    height = (int) strtol(token, &token, 10);
7107    while (*token == ' ') ++token;
7108    if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
7109    token += 3;
7110    width = (int) strtol(token, NULL, 10);
7111
7112    if (height > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
7113    if (width > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
7114
7115    *x = width;
7116    *y = height;
7117
7118    if (comp) *comp = 3;
7119    if (req_comp == 0) req_comp = 3;
7120
7121    if (!stbi__mad4sizes_valid(width, height, req_comp, sizeof(float), 0))
7122       return stbi__errpf("too large", "HDR image is too large");
7123
7124    // Read data
7125    hdr_data = (float *) stbi__malloc_mad4(width, height, req_comp, sizeof(float), 0);
7126    if (!hdr_data)
7127       return stbi__errpf("outofmem", "Out of memory");
7128
7129    // Load image data
7130    // image data is stored as some number of sca
7131    if ( width < 8 || width >= 32768) {
7132       // Read flat data
7133       for (j=0; j < height; ++j) {
7134          for (i=0; i < width; ++i) {
7135             stbi_uc rgbe[4];
7136            main_decode_loop:
7137             stbi__getn(s, rgbe, 4);
7138             stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
7139          }
7140       }
7141    } else {
7142       // Read RLE-encoded data
7143       scanline = NULL;
7144
7145       for (j = 0; j < height; ++j) {
7146          c1 = stbi__get8(s);
7147          c2 = stbi__get8(s);
7148          len = stbi__get8(s);
7149          if (c1 != 2 || c2 != 2 || (len & 0x80)) {
7150             // not run-length encoded, so we have to actually use THIS data as a decoded
7151             // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
7152             stbi_uc rgbe[4];
7153             rgbe[0] = (stbi_uc) c1;
7154             rgbe[1] = (stbi_uc) c2;
7155             rgbe[2] = (stbi_uc) len;
7156             rgbe[3] = (stbi_uc) stbi__get8(s);
7157             stbi__hdr_convert(hdr_data, rgbe, req_comp);
7158             i = 1;
7159             j = 0;
7160             STBI_FREE(scanline);
7161             goto main_decode_loop; // yes, this makes no sense
7162          }
7163          len <<= 8;
7164          len |= stbi__get8(s);
7165          if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
7166          if (scanline == NULL) {
7167             scanline = (stbi_uc *) stbi__malloc_mad2(width, 4, 0);
7168             if (!scanline) {
7169                STBI_FREE(hdr_data);
7170                return stbi__errpf("outofmem", "Out of memory");
7171             }
7172          }
7173
7174          for (k = 0; k < 4; ++k) {
7175             int nleft;
7176             i = 0;
7177             while ((nleft = width - i) > 0) {
7178                count = stbi__get8(s);
7179                if (count > 128) {
7180                   // Run
7181                   value = stbi__get8(s);
7182                   count -= 128;
7183                   if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
7184                   for (z = 0; z < count; ++z)
7185                      scanline[i++ * 4 + k] = value;
7186                } else {
7187                   // Dump
7188                   if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
7189                   for (z = 0; z < count; ++z)
7190                      scanline[i++ * 4 + k] = stbi__get8(s);
7191                }
7192             }
7193          }
7194          for (i=0; i < width; ++i)
7195             stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
7196       }
7197       if (scanline)
7198          STBI_FREE(scanline);
7199    }
7200
7201    return hdr_data;
7202 }
7203
7204 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
7205 {
7206    char buffer[STBI__HDR_BUFLEN];
7207    char *token;
7208    int valid = 0;
7209    int dummy;
7210
7211    if (!x) x = &dummy;
7212    if (!y) y = &dummy;
7213    if (!comp) comp = &dummy;
7214
7215    if (stbi__hdr_test(s) == 0) {
7216        stbi__rewind( s );
7217        return 0;
7218    }
7219
7220    for(;;) {
7221       token = stbi__hdr_gettoken(s,buffer);
7222       if (token[0] == 0) break;
7223       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
7224    }
7225
7226    if (!valid) {
7227        stbi__rewind( s );
7228        return 0;
7229    }
7230    token = stbi__hdr_gettoken(s,buffer);
7231    if (strncmp(token, "-Y ", 3)) {
7232        stbi__rewind( s );
7233        return 0;
7234    }
7235    token += 3;
7236    *y = (int) strtol(token, &token, 10);
7237    while (*token == ' ') ++token;
7238    if (strncmp(token, "+X ", 3)) {
7239        stbi__rewind( s );
7240        return 0;
7241    }
7242    token += 3;
7243    *x = (int) strtol(token, NULL, 10);
7244    *comp = 3;
7245    return 1;
7246 }
7247 #endif // STBI_NO_HDR
7248
7249 #ifndef STBI_NO_BMP
7250 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
7251 {
7252    void *p;
7253    stbi__bmp_data info;
7254
7255    info.all_a = 255;
7256    p = stbi__bmp_parse_header(s, &info);
7257    if (p == NULL) {
7258       stbi__rewind( s );
7259       return 0;
7260    }
7261    if (x) *x = s->img_x;
7262    if (y) *y = s->img_y;
7263    if (comp) {
7264       if (info.bpp == 24 && info.ma == 0xff000000)
7265          *comp = 3;
7266       else
7267          *comp = info.ma ? 4 : 3;
7268    }
7269    return 1;
7270 }
7271 #endif
7272
7273 #ifndef STBI_NO_PSD
7274 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
7275 {
7276    int channelCount, dummy, depth;
7277    if (!x) x = &dummy;
7278    if (!y) y = &dummy;
7279    if (!comp) comp = &dummy;
7280    if (stbi__get32be(s) != 0x38425053) {
7281        stbi__rewind( s );
7282        return 0;
7283    }
7284    if (stbi__get16be(s) != 1) {
7285        stbi__rewind( s );
7286        return 0;
7287    }
7288    stbi__skip(s, 6);
7289    channelCount = stbi__get16be(s);
7290    if (channelCount < 0 || channelCount > 16) {
7291        stbi__rewind( s );
7292        return 0;
7293    }
7294    *y = stbi__get32be(s);
7295    *x = stbi__get32be(s);
7296    depth = stbi__get16be(s);
7297    if (depth != 8 && depth != 16) {
7298        stbi__rewind( s );
7299        return 0;
7300    }
7301    if (stbi__get16be(s) != 3) {
7302        stbi__rewind( s );
7303        return 0;
7304    }
7305    *comp = 4;
7306    return 1;
7307 }
7308
7309 static int stbi__psd_is16(stbi__context *s)
7310 {
7311    int channelCount, depth;
7312    if (stbi__get32be(s) != 0x38425053) {
7313        stbi__rewind( s );
7314        return 0;
7315    }
7316    if (stbi__get16be(s) != 1) {
7317        stbi__rewind( s );
7318        return 0;
7319    }
7320    stbi__skip(s, 6);
7321    channelCount = stbi__get16be(s);
7322    if (channelCount < 0 || channelCount > 16) {
7323        stbi__rewind( s );
7324        return 0;
7325    }
7326    STBI_NOTUSED(stbi__get32be(s));
7327    STBI_NOTUSED(stbi__get32be(s));
7328    depth = stbi__get16be(s);
7329    if (depth != 16) {
7330        stbi__rewind( s );
7331        return 0;
7332    }
7333    return 1;
7334 }
7335 #endif
7336
7337 #ifndef STBI_NO_PIC
7338 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
7339 {
7340    int act_comp=0,num_packets=0,chained,dummy;
7341    stbi__pic_packet packets[10];
7342
7343    if (!x) x = &dummy;
7344    if (!y) y = &dummy;
7345    if (!comp) comp = &dummy;
7346
7347    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
7348       stbi__rewind(s);
7349       return 0;
7350    }
7351
7352    stbi__skip(s, 88);
7353
7354    *x = stbi__get16be(s);
7355    *y = stbi__get16be(s);
7356    if (stbi__at_eof(s)) {
7357       stbi__rewind( s);
7358       return 0;
7359    }
7360    if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
7361       stbi__rewind( s );
7362       return 0;
7363    }
7364
7365    stbi__skip(s, 8);
7366
7367    do {
7368       stbi__pic_packet *packet;
7369
7370       if (num_packets==sizeof(packets)/sizeof(packets[0]))
7371          return 0;
7372
7373       packet = &packets[num_packets++];
7374       chained = stbi__get8(s);
7375       packet->size    = stbi__get8(s);
7376       packet->type    = stbi__get8(s);
7377       packet->channel = stbi__get8(s);
7378       act_comp |= packet->channel;
7379
7380       if (stbi__at_eof(s)) {
7381           stbi__rewind( s );
7382           return 0;
7383       }
7384       if (packet->size != 8) {
7385           stbi__rewind( s );
7386           return 0;
7387       }
7388    } while (chained);
7389
7390    *comp = (act_comp & 0x10 ? 4 : 3);
7391
7392    return 1;
7393 }
7394 #endif
7395
7396 // *************************************************************************************************
7397 // Portable Gray Map and Portable Pixel Map loader
7398 // by Ken Miller
7399 //
7400 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
7401 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
7402 //
7403 // Known limitations:
7404 //    Does not support comments in the header section
7405 //    Does not support ASCII image data (formats P2 and P3)
7406
7407 #ifndef STBI_NO_PNM
7408
7409 static int      stbi__pnm_test(stbi__context *s)
7410 {
7411    char p, t;
7412    p = (char) stbi__get8(s);
7413    t = (char) stbi__get8(s);
7414    if (p != 'P' || (t != '5' && t != '6')) {
7415        stbi__rewind( s );
7416        return 0;
7417    }
7418    return 1;
7419 }
7420
7421 static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
7422 {
7423    stbi_uc *out;
7424    STBI_NOTUSED(ri);
7425
7426    ri->bits_per_channel = stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n);
7427    if (ri->bits_per_channel == 0)
7428       return 0;
7429
7430    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
7431    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
7432
7433    *x = s->img_x;
7434    *y = s->img_y;
7435    if (comp) *comp = s->img_n;
7436
7437    if (!stbi__mad4sizes_valid(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0))
7438       return stbi__errpuc("too large", "PNM too large");
7439
7440    out = (stbi_uc *) stbi__malloc_mad4(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0);
7441    if (!out) return stbi__errpuc("outofmem", "Out of memory");
7442    stbi__getn(s, out, s->img_n * s->img_x * s->img_y * (ri->bits_per_channel / 8));
7443
7444    if (req_comp && req_comp != s->img_n) {
7445       out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
7446       if (out == NULL) return out; // stbi__convert_format frees input on failure
7447    }
7448    return out;
7449 }
7450
7451 static int      stbi__pnm_isspace(char c)
7452 {
7453    return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
7454 }
7455
7456 static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
7457 {
7458    for (;;) {
7459       while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
7460          *c = (char) stbi__get8(s);
7461
7462       if (stbi__at_eof(s) || *c != '#')
7463          break;
7464
7465       while (!stbi__at_eof(s) && *c != '\n' && *c != '\r' )
7466          *c = (char) stbi__get8(s);
7467    }
7468 }
7469
7470 static int      stbi__pnm_isdigit(char c)
7471 {
7472    return c >= '0' && c <= '9';
7473 }
7474
7475 static int      stbi__pnm_getinteger(stbi__context *s, char *c)
7476 {
7477    int value = 0;
7478
7479    while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
7480       value = value*10 + (*c - '0');
7481       *c = (char) stbi__get8(s);
7482    }
7483
7484    return value;
7485 }
7486
7487 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
7488 {
7489    int maxv, dummy;
7490    char c, p, t;
7491
7492    if (!x) x = &dummy;
7493    if (!y) y = &dummy;
7494    if (!comp) comp = &dummy;
7495
7496    stbi__rewind(s);
7497
7498    // Get identifier
7499    p = (char) stbi__get8(s);
7500    t = (char) stbi__get8(s);
7501    if (p != 'P' || (t != '5' && t != '6')) {
7502        stbi__rewind(s);
7503        return 0;
7504    }
7505
7506    *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
7507
7508    c = (char) stbi__get8(s);
7509    stbi__pnm_skip_whitespace(s, &c);
7510
7511    *x = stbi__pnm_getinteger(s, &c); // read width
7512    stbi__pnm_skip_whitespace(s, &c);
7513
7514    *y = stbi__pnm_getinteger(s, &c); // read height
7515    stbi__pnm_skip_whitespace(s, &c);
7516
7517    maxv = stbi__pnm_getinteger(s, &c);  // read max value
7518    if (maxv > 65535)
7519       return stbi__err("max value > 65535", "PPM image supports only 8-bit and 16-bit images");
7520    else if (maxv > 255)
7521       return 16;
7522    else
7523       return 8;
7524 }
7525
7526 static int stbi__pnm_is16(stbi__context *s)
7527 {
7528    if (stbi__pnm_info(s, NULL, NULL, NULL) == 16)
7529            return 1;
7530    return 0;
7531 }
7532 #endif
7533
7534 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
7535 {
7536    #ifndef STBI_NO_JPEG
7537    if (stbi__jpeg_info(s, x, y, comp)) return 1;
7538    #endif
7539
7540    #ifndef STBI_NO_PNG
7541    if (stbi__png_info(s, x, y, comp))  return 1;
7542    #endif
7543
7544    #ifndef STBI_NO_GIF
7545    if (stbi__gif_info(s, x, y, comp))  return 1;
7546    #endif
7547
7548    #ifndef STBI_NO_BMP
7549    if (stbi__bmp_info(s, x, y, comp))  return 1;
7550    #endif
7551
7552    #ifndef STBI_NO_PSD
7553    if (stbi__psd_info(s, x, y, comp))  return 1;
7554    #endif
7555
7556    #ifndef STBI_NO_PIC
7557    if (stbi__pic_info(s, x, y, comp))  return 1;
7558    #endif
7559
7560    #ifndef STBI_NO_PNM
7561    if (stbi__pnm_info(s, x, y, comp))  return 1;
7562    #endif
7563
7564    #ifndef STBI_NO_HDR
7565    if (stbi__hdr_info(s, x, y, comp))  return 1;
7566    #endif
7567
7568    // test tga last because it's a crappy test!
7569    #ifndef STBI_NO_TGA
7570    if (stbi__tga_info(s, x, y, comp))
7571        return 1;
7572    #endif
7573    return stbi__err("unknown image type", "Image not of any known type, or corrupt");
7574 }
7575
7576 static int stbi__is_16_main(stbi__context *s)
7577 {
7578    #ifndef STBI_NO_PNG
7579    if (stbi__png_is16(s))  return 1;
7580    #endif
7581
7582    #ifndef STBI_NO_PSD
7583    if (stbi__psd_is16(s))  return 1;
7584    #endif
7585
7586    #ifndef STBI_NO_PNM
7587    if (stbi__pnm_is16(s))  return 1;
7588    #endif
7589    return 0;
7590 }
7591
7592 #ifndef STBI_NO_STDIO
7593 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
7594 {
7595     FILE *f = stbi__fopen(filename, "rb");
7596     int result;
7597     if (!f) return stbi__err("can't fopen", "Unable to open file");
7598     result = stbi_info_from_file(f, x, y, comp);
7599     fclose(f);
7600     return result;
7601 }
7602
7603 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
7604 {
7605    int r;
7606    stbi__context s;
7607    long pos = ftell(f);
7608    stbi__start_file(&s, f);
7609    r = stbi__info_main(&s,x,y,comp);
7610    fseek(f,pos,SEEK_SET);
7611    return r;
7612 }
7613
7614 STBIDEF int stbi_is_16_bit(char const *filename)
7615 {
7616     FILE *f = stbi__fopen(filename, "rb");
7617     int result;
7618     if (!f) return stbi__err("can't fopen", "Unable to open file");
7619     result = stbi_is_16_bit_from_file(f);
7620     fclose(f);
7621     return result;
7622 }
7623
7624 STBIDEF int stbi_is_16_bit_from_file(FILE *f)
7625 {
7626    int r;
7627    stbi__context s;
7628    long pos = ftell(f);
7629    stbi__start_file(&s, f);
7630    r = stbi__is_16_main(&s);
7631    fseek(f,pos,SEEK_SET);
7632    return r;
7633 }
7634 #endif // !STBI_NO_STDIO
7635
7636 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
7637 {
7638    stbi__context s;
7639    stbi__start_mem(&s,buffer,len);
7640    return stbi__info_main(&s,x,y,comp);
7641 }
7642
7643 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
7644 {
7645    stbi__context s;
7646    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
7647    return stbi__info_main(&s,x,y,comp);
7648 }
7649
7650 STBIDEF int stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len)
7651 {
7652    stbi__context s;
7653    stbi__start_mem(&s,buffer,len);
7654    return stbi__is_16_main(&s);
7655 }
7656
7657 STBIDEF int stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *c, void *user)
7658 {
7659    stbi__context s;
7660    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
7661    return stbi__is_16_main(&s);
7662 }
7663
7664 #endif // STB_IMAGE_IMPLEMENTATION
7665
7666 /*
7667    revision history:
7668       2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
7669       2.19  (2018-02-11) fix warning
7670       2.18  (2018-01-30) fix warnings
7671       2.17  (2018-01-29) change sbti__shiftsigned to avoid clang -O2 bug
7672                          1-bit BMP
7673                          *_is_16_bit api
7674                          avoid warnings
7675       2.16  (2017-07-23) all functions have 16-bit variants;
7676                          STBI_NO_STDIO works again;
7677                          compilation fixes;
7678                          fix rounding in unpremultiply;
7679                          optimize vertical flip;
7680                          disable raw_len validation;
7681                          documentation fixes
7682       2.15  (2017-03-18) fix png-1,2,4 bug; now all Imagenet JPGs decode;
7683                          warning fixes; disable run-time SSE detection on gcc;
7684                          uniform handling of optional "return" values;
7685                          thread-safe initialization of zlib tables
7686       2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
7687       2.13  (2016-11-29) add 16-bit API, only supported for PNG right now
7688       2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
7689       2.11  (2016-04-02) allocate large structures on the stack
7690                          remove white matting for transparent PSD
7691                          fix reported channel count for PNG & BMP
7692                          re-enable SSE2 in non-gcc 64-bit
7693                          support RGB-formatted JPEG
7694                          read 16-bit PNGs (only as 8-bit)
7695       2.10  (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
7696       2.09  (2016-01-16) allow comments in PNM files
7697                          16-bit-per-pixel TGA (not bit-per-component)
7698                          info() for TGA could break due to .hdr handling
7699                          info() for BMP to shares code instead of sloppy parse
7700                          can use STBI_REALLOC_SIZED if allocator doesn't support realloc
7701                          code cleanup
7702       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
7703       2.07  (2015-09-13) fix compiler warnings
7704                          partial animated GIF support
7705                          limited 16-bpc PSD support
7706                          #ifdef unused functions
7707                          bug with < 92 byte PIC,PNM,HDR,TGA
7708       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
7709       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
7710       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
7711       2.03  (2015-04-12) extra corruption checking (mmozeiko)
7712                          stbi_set_flip_vertically_on_load (nguillemot)
7713                          fix NEON support; fix mingw support
7714       2.02  (2015-01-19) fix incorrect assert, fix warning
7715       2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
7716       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
7717       2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
7718                          progressive JPEG (stb)
7719                          PGM/PPM support (Ken Miller)
7720                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
7721                          GIF bugfix -- seemingly never worked
7722                          STBI_NO_*, STBI_ONLY_*
7723       1.48  (2014-12-14) fix incorrectly-named assert()
7724       1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
7725                          optimize PNG (ryg)
7726                          fix bug in interlaced PNG with user-specified channel count (stb)
7727       1.46  (2014-08-26)
7728               fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
7729       1.45  (2014-08-16)
7730               fix MSVC-ARM internal compiler error by wrapping malloc
7731       1.44  (2014-08-07)
7732               various warning fixes from Ronny Chevalier
7733       1.43  (2014-07-15)
7734               fix MSVC-only compiler problem in code changed in 1.42
7735       1.42  (2014-07-09)
7736               don't define _CRT_SECURE_NO_WARNINGS (affects user code)
7737               fixes to stbi__cleanup_jpeg path
7738               added STBI_ASSERT to avoid requiring assert.h
7739       1.41  (2014-06-25)
7740               fix search&replace from 1.36 that messed up comments/error messages
7741       1.40  (2014-06-22)
7742               fix gcc struct-initialization warning
7743       1.39  (2014-06-15)
7744               fix to TGA optimization when req_comp != number of components in TGA;
7745               fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
7746               add support for BMP version 5 (more ignored fields)
7747       1.38  (2014-06-06)
7748               suppress MSVC warnings on integer casts truncating values
7749               fix accidental rename of 'skip' field of I/O
7750       1.37  (2014-06-04)
7751               remove duplicate typedef
7752       1.36  (2014-06-03)
7753               convert to header file single-file library
7754               if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
7755       1.35  (2014-05-27)
7756               various warnings
7757               fix broken STBI_SIMD path
7758               fix bug where stbi_load_from_file no longer left file pointer in correct place
7759               fix broken non-easy path for 32-bit BMP (possibly never used)
7760               TGA optimization by Arseny Kapoulkine
7761       1.34  (unknown)
7762               use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
7763       1.33  (2011-07-14)
7764               make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
7765       1.32  (2011-07-13)
7766               support for "info" function for all supported filetypes (SpartanJ)
7767       1.31  (2011-06-20)
7768               a few more leak fixes, bug in PNG handling (SpartanJ)
7769       1.30  (2011-06-11)
7770               added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
7771               removed deprecated format-specific test/load functions
7772               removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
7773               error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
7774               fix inefficiency in decoding 32-bit BMP (David Woo)
7775       1.29  (2010-08-16)
7776               various warning fixes from Aurelien Pocheville
7777       1.28  (2010-08-01)
7778               fix bug in GIF palette transparency (SpartanJ)
7779       1.27  (2010-08-01)
7780               cast-to-stbi_uc to fix warnings
7781       1.26  (2010-07-24)
7782               fix bug in file buffering for PNG reported by SpartanJ
7783       1.25  (2010-07-17)
7784               refix trans_data warning (Won Chun)
7785       1.24  (2010-07-12)
7786               perf improvements reading from files on platforms with lock-heavy fgetc()
7787               minor perf improvements for jpeg
7788               deprecated type-specific functions so we'll get feedback if they're needed
7789               attempt to fix trans_data warning (Won Chun)
7790       1.23    fixed bug in iPhone support
7791       1.22  (2010-07-10)
7792               removed image *writing* support
7793               stbi_info support from Jetro Lauha
7794               GIF support from Jean-Marc Lienher
7795               iPhone PNG-extensions from James Brown
7796               warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
7797       1.21    fix use of 'stbi_uc' in header (reported by jon blow)
7798       1.20    added support for Softimage PIC, by Tom Seddon
7799       1.19    bug in interlaced PNG corruption check (found by ryg)
7800       1.18  (2008-08-02)
7801               fix a threading bug (local mutable static)
7802       1.17    support interlaced PNG
7803       1.16    major bugfix - stbi__convert_format converted one too many pixels
7804       1.15    initialize some fields for thread safety
7805       1.14    fix threadsafe conversion bug
7806               header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
7807       1.13    threadsafe
7808       1.12    const qualifiers in the API
7809       1.11    Support installable IDCT, colorspace conversion routines
7810       1.10    Fixes for 64-bit (don't use "unsigned long")
7811               optimized upsampling by Fabian "ryg" Giesen
7812       1.09    Fix format-conversion for PSD code (bad global variables!)
7813       1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
7814       1.07    attempt to fix C++ warning/errors again
7815       1.06    attempt to fix C++ warning/errors again
7816       1.05    fix TGA loading to return correct *comp and use good luminance calc
7817       1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
7818       1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
7819       1.02    support for (subset of) HDR files, float interface for preferred access to them
7820       1.01    fix bug: possible bug in handling right-side up bmps... not sure
7821               fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
7822       1.00    interface to zlib that skips zlib header
7823       0.99    correct handling of alpha in palette
7824       0.98    TGA loader by lonesock; dynamically add loaders (untested)
7825       0.97    jpeg errors on too large a file; also catch another malloc failure
7826       0.96    fix detection of invalid v value - particleman@mollyrocket forum
7827       0.95    during header scan, seek to markers in case of padding
7828       0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
7829       0.93    handle jpegtran output; verbose errors
7830       0.92    read 4,8,16,24,32-bit BMP files of several formats
7831       0.91    output 24-bit Windows 3.0 BMP files
7832       0.90    fix a few more warnings; bump version number to approach 1.0
7833       0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
7834       0.60    fix compiling as c++
7835       0.59    fix warnings: merge Dave Moore's -Wall fixes
7836       0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
7837       0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
7838       0.56    fix bug: zlib uncompressed mode len vs. nlen
7839       0.55    fix bug: restart_interval not initialized to 0
7840       0.54    allow NULL for 'int *comp'
7841       0.53    fix bug in png 3->4; speedup png decoding
7842       0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
7843       0.51    obey req_comp requests, 1-component jpegs return as 1-component,
7844               on 'test' only check type, not whether we support this variant
7845       0.50  (2006-11-19)
7846               first released version
7847 */
7848
7849
7850 /*
7851 ------------------------------------------------------------------------------
7852 This software is available under 2 licenses -- choose whichever you prefer.
7853 ------------------------------------------------------------------------------
7854 ALTERNATIVE A - MIT License
7855 Copyright (c) 2017 Sean Barrett
7856 Permission is hereby granted, free of charge, to any person obtaining a copy of
7857 this software and associated documentation files (the "Software"), to deal in
7858 the Software without restriction, including without limitation the rights to
7859 use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
7860 of the Software, and to permit persons to whom the Software is furnished to do
7861 so, subject to the following conditions:
7862 The above copyright notice and this permission notice shall be included in all
7863 copies or substantial portions of the Software.
7864 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
7865 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
7866 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
7867 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
7868 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
7869 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
7870 SOFTWARE.
7871 ------------------------------------------------------------------------------
7872 ALTERNATIVE B - Public Domain (www.unlicense.org)
7873 This is free and unencumbered software released into the public domain.
7874 Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
7875 software, either in source code form or as a compiled binary, for any purpose,
7876 commercial or non-commercial, and by any means.
7877 In jurisdictions that recognize copyright laws, the author or authors of this
7878 software dedicate any and all copyright interest in the software to the public
7879 domain. We make this dedication for the benefit of the public at large and to
7880 the detriment of our heirs and successors. We intend this dedication to be an
7881 overt act of relinquishment in perpetuity of all present and future rights to
7882 this software under copyright law.
7883 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
7884 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
7885 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
7886 AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
7887 ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
7888 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
7889 ------------------------------------------------------------------------------
7890 */