Browsed by

好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)
Making Photos Smaller Without Quality Loss – by Yelp

Making Photos Smaller Without Quality Loss – by Yelp

Making Photos Smaller Without Quality Loss

Yelp has over 100 million user-generated photos ranging from pictures of dinners or haircuts, to one of our newest features, #yelfies. These images account for a majority of the bandwidth for users of the app and website, and represent a significant cost to store and transfer. In our quest to give our users the best experience, we worked hard to optimize our photos and were able to achieve a 30% average size reduction. This saves our users time and bandwidth and reduces our cost to serve those images. Oh, and we did it all without reducing the quality of these images!


Yelp has been storing user-uploaded photos for over 12 years. We save lossless formats (PNG, GIF) as PNGs and all other formats as JPEG. We use Python and Pillow for saving images, and start our story of photo uploads with a snippet like this:

With this as a starting point, we began to investigate potential optimizations on file size that we could apply without a loss in quality.


First, we had to decide whether to handle this ourselves or let a CDN provider magically change our photos. With the priority we place on high quality content, it made sense to evaluate options and make potential size vs quality tradeoffs ourselves. We moved ahead with research on the current state of photo file size reduction – what changes could be made and how much size / quality reduction was associated with each. With this research completed, we decided to work on three primary categories. The rest of this post explains what we did and how much benefit we realized from each optimization.

  1. Changes in Pillow
    • Optimize flag
    • Progressive JPEG
  2. Changes to application photo logic
    • Large PNG detection
    • Dynamic JPEG quality
  3. Changes to JPEG encoder
    • Mozjpeg (trellis quantization, custom quantization matrix)

Changes in Pillow

Optimize Flag

This is one of the easiest changes we made: enabling the setting in Pillow responsible for additional file size savings at the cost of CPU time (optimize=True). Due to the nature of the tradeoff being made, this does not impact image quality at all.

For JPEG, this flag instructs the encoder to find the optimal Huffman coding by making an additional pass over each image scan. Each first pass, instead of writing to file, calculates the occurrence statistics of each value, required information to compute the ideal coding. PNG internally uses zlib, so the optimize flag in that case effectively instructs the encoder to use gzip -9 instead of gzip -6.

This is an easy change to make but it turns out that it is not a silver bullet, reducing file size by just a few percent.

Progressive JPEG

When saving an image as a JPEG, there are a few different types you can choose from:

  • Baseline JPEG images load from top to bottom.
  • Progressive JPEG images load from more blurry to less blurry. The progressive option can easily be enabled in Pillow (progressive=True). As a result, there is a perceived performance increase (that is, it’s easier to notice when an image is partially absent than it is to tell it’s not fully sharp).

Additionally, the way progressive files are packed generally results in a small reduction to file size. As more fully explained by the Wikipedia article, JPEG format uses a zigzag pattern over the 8×8 blocks of pixels to do entropy coding. When the values of those blocks of pixels are unpacked and laid out in order, you generally have non-zero numbers first and then sequences of 0s, with that pattern repeating and interleaved for each 8×8 block in the image. With progressive encoding, the order of the unwound pixel blocks changes. The higher value numbers for each block come first in the file, (which gives the earliest scans of a progressive image its distinct blockiness), and the longer spans of small numbers, including more 0s, that add the finer details are towards the end. This reordering of the image data doesn’t change the image itself, but does increase the number of 0s that might be in a row (which can be more easily compressed).

Comparison with a delicious user-contributed image of a donut (click for larger):

(left) A mock of how a baseline JPEG renders.

(left) A mock of how a baseline JPEG renders.

(right) A mock of how a progressive JPEG renders.

(right) A mock of how a progressive JPEG renders.

Changes to Application Photo Logic

Large PNG Detection

Yelp targets two image formats for serving user-generated content – JPEG and PNG. JPEG is a great format for photos but generally struggles with high-contrast design content (like logos). By contrast, PNG is fully-lossless, so great for graphics but too large for photos where small distortions are not visible. In the cases where users upload PNGs that are actually photographs, we can save a lot of space if we identify these files and save them as JPEG instead. Some common sources of PNG photos on Yelp are screenshots taken by mobile devices and apps that modify photos to add effects or borders.

(left) A typical composited PNG upload with logo and border. (right) A typical PNG upload from a screenshot.

(left) A typical composited PNG upload with logo and border. (right) A typical PNG upload from a screenshot.

We wanted to reduce the number of these unnecessary PNGs, but it was important to avoid overreaching and changing format or degrading quality of logos, graphics, etc. How can we tell if something is a photo? From the pixels?

Using an experimental sample of 2,500 images, we found that a combination of file size and unique pixels worked well to detect photos. We generate a candidate thumbnail image at our largest resolution and see if the output PNG file is larger than 300KiB. If it is, we’ll also check the image contents to see if there are over 2^16 unique colors (Yelp converts RGBA image uploads to RGB, but if we didn’t, we would check that too).

In the experimental dataset, these hand-tuned thresholds to define “bigness” captured 88% of the possible file size savings (i.e. our expected file size savings if we were to convert all of the images) without any false-positives of graphics being converted.

Dynamic JPEG Quality

The first and most well-known way to reduce the size of JPEG files is a setting called quality. Many applications capable of saving to the JPEG format specify quality as a number.

Quality is somewhat of an abstraction. In fact, there are separate qualities for each of the color channels of a JPEG image. Quality levels 0 – 100 map to different quantization tables for the color channels, determining how much data is lost (usually high frequency). Quantization in the signal domain is the one step in the JPEG encoding process that loses information.

The simplest way to reduce file size is to reduce the quality of the image, introducing more noise. Not every image loses the same amount of information at a given quality level though.

We can dynamically choose a quality setting which is optimized for each image, finding an ideal balance between quality and size. There are two ways to do this:

  • Bottom-up: These are algorithms that generate tuned quantization tables by processing the image at the 8×8 pixel block level. They calculate both how much theoretical quality was lost and how that lost data either amplifies or cancels out to be more or less visible to the human eye.
  • Top-down: These are algorithms that compare an entire image against an original version of itself and detect how much information was lost. By iteratively generating candidate images with different quality settings, we can choose the one that meets a minimum evaluated level by whichever evaluation algorithm we choose.

We evaluated a bottom-up algorithm, which in our experience did not yield suitable results at the higher end of the quality range we wanted to use (though it seems like it may still have potential in the mid-range of image qualities, where an encoder can begin to be more adventurous with the bytes it discards). Many of the scholarly papers on this strategy were published in the early 90s when computing power was at a premium and took shortcuts that option B addresses, such as not evaluating interactions across blocks.

So we took the second approach: use a bisection algorithm to generate candidate images at different quality levels, and evaluate each candidate image’s drop in quality by calculating its structural similarity metric (SSIM) using pyssim, until that value is at a configurable but static threshold. This enables us to selectively lower the average file size (and average quality) only for images which were above a perceivable decrease to begin with.

In the below chart, we plot the SSIM values of 2500 images regenerated via 3 different quality approaches.

  1. The original images made by the current approach at quality = 85 are plotted as the blue line.
  2. An alternative approach to lowering file size, changing quality = 80, is plotted as the red line.
  3. And finally, the approach we ended up using, dynamic quality, SSIM 80-85, in orange, chooses a quality for the image in the range 80 to 85 (inclusive) based on meeting or exceeding an SSIM ratio: a pre-computed static value that made the transition occur somewhere in the middle of the images range. This lets us lower the average file size without lowering the quality of our worst-quality images.

SSIMs of 2500 images with 3 different quality strategies.

SSIMs of 2500 images with 3 different quality strategies.


There are quite a few image quality algorithms that try to mimic the human vision system. We’ve evaluated many of these and think that SSIM, while older, is most suitable for this iterative optimization based on a few characteristics:

  1. Sensitive to JPEG quantization error
  2. Fast, simple algorithm
  3. Can be computed on PIL native image objects without converting images to PNG and passing them to CLI applications (see #2)

Example Code for Dynamic Quality:

There are a few other blog posts about this technique, here is one by Colt Mcanlis. And as we go to press, Etsy has published one here! High five, faster internet!

Changes to JPEG Encoder


Mozjpeg is an open-source fork of libjpeg-turbo, which trades execution time for file size. This approach meshes well with the offline batch approach to regenerating images. With the investment of about 3-5x more time than libjpeg-turbo, a few more expensive algorithms make images smaller!

One of mozjpeg’s differentiators is the use of an alternative quantization table. As mentioned above, quality is an abstraction of the quantization tables used for each color channel. All signs point to the default JPEG quantization tables as being pretty easy to beat. In the words of the JPEG spec:

These tables are provided as examples only and are not necessarily suitable for any particular application.

So naturally, it shouldn’t surprise you to learn that these tables are the default used by most encoder implementations… 🤔🤔🤔

Mozjpeg has gone through the trouble of benchmarking alternative tables for us, and uses the best performing general-purpose alternative for images it creates.

Mozjpeg + Pillow

Most Linux distributions have libjpeg installed by default. So using mozjpeg under Pillow doesn’t work by default, but configuring it isn’t terribly difficult either. When you build mozjpeg, use the --with-jpeg8 flag and make sure it can be linked by Pillow will find it. If you’re using Docker, you might have a Dockerfile like:

That’s it! Build it and you’ll be able to use Pillow backed by mozjpeg within your normal images workflow.


How much did each of those improvements matter for us? We started this research by randomly sampling 2,500 of Yelp’s business photos to put through our processing pipeline and measure the impact on file size.

  1. Changes to Pillow settings were responsible for about 4.5% of the savings
  2. Large PNG detection was responsible for about 6.2% of the savings
  3. Dynamic Quality was responsible for about 4.5% of the savings
  4. Switching to the mozjpeg encoder was responsible for about 13.8% of the savings

This adds up to an average image file size reduction of around 30%, which we applied to our largest and most common image resolutions, making the website faster for users and saving terabytes a day in data transfer. As measured at the CDN:

Average filesize over time, as measured from the CDN (combined with non-image static content).

Average filesize over time, as measured from the CDN (combined with non-image static content).

What we didn’t do

This section is intended to introduce a few other common improvements that you might be able to make, that either weren’t relevant to Yelp due to defaults chosen by our tooling, or tradeoffs we chose not to make.


Subsampling is a major factor in determining both quality and file size for web images. Longer descriptions of subsampling can be found online, but suffice it to say for this blog post that we were already subsampling at 4:1:1 (which is Pillow’s default when nothing else is specified) so we weren’t able to realize any further savings here.

Lossy PNG encoding

After learning what we did about PNGs, choosing to preserve some of them as PNG but with a lossy encoder like pngmini could have made sense, but we chose to resave them as JPEG instead. This is an alternate option with reasonable results, 72-85% file size savings over unmodified PNGs according to the author.

Dynamic content types

Support for more modern content types like WebP or JPEG2k is certainly on our radar. Even once that hypothetical project ships, there will be a long-tail of users requesting these now-optimized JPEG/PNG images which will continue to make this effort well worth it.


We use SVG in many places on our website, like the static assets created by our designers that go into our styleguide. While this format and optimization tools like svgo are useful to reduce website page weight, it isn’t related to what we did here.

Vendor Magic

There are too many providers to list that offer image delivery / resizing / cropping / transcoding as a service. Including open-source thumbor. Maybe this is the easiest way to support responsive images, dynamic content types and remain on the cutting edge for us in the future. For now our solution remains self-contained.

Further Reading

Two books listed here absolutely stand on their own outside the context of the post, and are highly recommended as further reading on the subject.

来源: Making Photos Smaller Without Quality Loss

好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)


如何构建一个中型的 web 应用(全栈技术)

计算机界的轮子已经如此之多,我的观点是技术不转化成应用是没有价值的,本文主要挑选了一些技术,复用一些优秀的轮子,用最小的成本构建自己的 web 应用。



boostrap twitter 出品的响应式框架,可以快速构建优美的前端界面

material-design-lite Google 出品的Material Design 风格的前端框架

前端库 /框架

jquery 方便快捷地操纵 dom


yog2 百度出品的前端构建工具,将fis3express结合在一起

webpack 当下最火的前端构建工具


nodejavascript 写后端应用


express node.js 官方推荐的 web 框架

koa express 原始团队出品,口碑很好,我并没有在实际项目中使用过


mysql 全球最流行的开源数据库,各大互联网公司都在大范围使用

mongo 时下最流行的 nosql 数据库,经过几年的发展已经很稳定了


knex mysql 的 node.js 版的 sql 构建库,需要搭配mysql 的 node.js 驱动 使用

mongoose mongo 的 node.js 版的数据库驱动


ansible 基于 ssh 的自动化部署工具,我还在摸索的阶段


ucloud 口碑不错的云服务商

阿里云 阿里巴巴旗下的云服务,号称国内最大

百度云 推荐一下自家公司的产品


七牛云 老牌云存储服务商,有免费额度



好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)











InfoQ.com是由实践者驱动的社区媒体,我们的目标是: 推进和传播软件开发领域的知识和创新。 作为编辑兼实践者,我们参与成员驱动的编辑过程,向着这个共同目标努力前进,具体工作方式包括:翻译和撰写新闻,并以文章、采访、录制技术大会视频等多种形式分享知识。






来源: 让大家在InfoQ上听见你的声音

好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)
好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)




后来,Steve Jobs – 史蒂夫·乔布斯对它的使用,使得它名声度越来越高。Jobs在为Macworld Conference and Expo,和其它苹果公司的活动中,在发表主题演讲时,都是使用自家的keynote。Jobs的特立独行、专注武断,以及他的对 Apple一系列产片的[美]的发掘和展现,使得苹果势头凶猛、营收累累。同时也直接、间接的带动了Keynote的传播。

Apple笔记本 – Mac Air、Mac Pro,等等,已经成为互联网界产品经理、美工、IOS 必备;随身带Mac,随手拿iPhone,随心Keynote展示PPT;已成各人zhuangbility利器。

所以,现在,每年国内国外有许许多多的互联网大会、Google大会、 Apple大会、特斯拉大会、Android大会、AWS大会,直接就以Keynote代替PPT了;就是说:Keynote就是PPT,PPT就是Keynote;你要是用PowerPoint,你都不好意思和人打招呼!



好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)





做关键词外链 实在不行就做站群 多做些站 内容不要重复 但是内容要有相关性 做的外链都指向主站 但是子站之间不要相互链接





优化的好ip 8k UV 2.4w





好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)
好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)



当当前页面3分钟内无任何 鼠标、键盘操作时,即弹出dvi页面,



好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)
好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)







简介:乱序优化是现代编译器非常重要的特性,本文介绍了什么是乱序优化,以及由此引发的一个gcc bug,希望引起各位开发者的注意。


简介:Ajax作为一个非常常用的功能,在页面中的作用越来越举足轻重。而对于浏览器的支持,易用性,lib本身大小就成为了衡量一个lib的指标。作者重写了ajax library用来取代现有的prototype.js。


简介:本文重点讲解异步连接池的诞生背景和使用方法,同时介绍 java nio技术的基础知识。


简介:XSS(Cross Site Script)跨站脚本攻击。它指的是恶意攻击者往Web页面里插入恶意html代码,当用户浏览该页之时,嵌入其中Web里面的html代码会被执行,从而达到恶意用户的特殊目的。本文介绍了该攻击方式,并给出了一些防范措施。


简介:Ajaj即Asynchronous JavaScript And JavaScript_Text。 它跟Ajax(具体的详细的介绍请参见Ajax: A New Approach to Web Applications。)类似,Ajaj也是在不刷新页面的情况下,和server进行交互,并且可以实现跨域交互。

使用hudson搭建daily build系统

简介:每日构建,Daily Build是指周期性地(每天)、全自动地、完整地对整个项目的代码进行编译和集成。本文以miscsearch组搭建hudson服务器的实践过程为例,介绍了daily build系统的搭建过程。

Bigmem: 在32bit下利用超过4G内存

简介:32bit应用程序由于寻址空间的限制,无法直接使用4G以上的物理内存, 这对一些性能要求高,内存开销大的应用程序而言是很大的限制. 本文介绍了一种在32bit下利用超过4G的内存的方法和相应的实现。


简介:本文是卷1-1文章“C10K问题—epoll简介”的续篇,介绍了如何利用流水线和一些锁的技巧提高服务器吞吐量,以及新兴的Lock Free技术。


简介:本文从对系统在工程层面上优化的角度出发,引入泛型程序设计技术,着重讲述了policy classes和type_selector两种方法对于系统工程设计与性能上的优化。其中在对policy classes的介绍中,将其与C++ virtual function做了详细分析与对比;在对type_selector技术的阐述中,介绍了如何实现源代码层面上的可配置性。本文所采用的优化方法是在程序设计层面上展开的,充分利用了C++语言的特性与灵巧的设计来达到优化目的的。本文在系统时空复杂度优化与系统设计方法两方面并重,并在最后给出了一个关于可持久化的对象与关系数据库映射(O/R Mapping)的迷你框架的实现,来阐明在实际项目中这些技术与方法的综合应用。

Read More Read More