Browsed by
标签:互联网

好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)
Loading...
304 views
Making Photos Smaller Without Quality Loss – by Yelp

Making Photos Smaller Without Quality Loss – by Yelp

Making Photos Smaller Without Quality Loss

Yelp has over 100 million user-generated photos ranging from pictures of dinners or haircuts, to one of our newest features, #yelfies. These images account for a majority of the bandwidth for users of the app and website, and represent a significant cost to store and transfer. In our quest to give our users the best experience, we worked hard to optimize our photos and were able to achieve a 30% average size reduction. This saves our users time and bandwidth and reduces our cost to serve those images. Oh, and we did it all without reducing the quality of these images!

Background

Yelp has been storing user-uploaded photos for over 12 years. We save lossless formats (PNG, GIF) as PNGs and all other formats as JPEG. We use Python and Pillow for saving images, and start our story of photo uploads with a snippet like this:

With this as a starting point, we began to investigate potential optimizations on file size that we could apply without a loss in quality.

Optimizations

First, we had to decide whether to handle this ourselves or let a CDN provider magically change our photos. With the priority we place on high quality content, it made sense to evaluate options and make potential size vs quality tradeoffs ourselves. We moved ahead with research on the current state of photo file size reduction – what changes could be made and how much size / quality reduction was associated with each. With this research completed, we decided to work on three primary categories. The rest of this post explains what we did and how much benefit we realized from each optimization.

  1. Changes in Pillow
    • Optimize flag
    • Progressive JPEG
  2. Changes to application photo logic
    • Large PNG detection
    • Dynamic JPEG quality
  3. Changes to JPEG encoder
    • Mozjpeg (trellis quantization, custom quantization matrix)

Changes in Pillow

Optimize Flag

This is one of the easiest changes we made: enabling the setting in Pillow responsible for additional file size savings at the cost of CPU time (optimize=True). Due to the nature of the tradeoff being made, this does not impact image quality at all.

For JPEG, this flag instructs the encoder to find the optimal Huffman coding by making an additional pass over each image scan. Each first pass, instead of writing to file, calculates the occurrence statistics of each value, required information to compute the ideal coding. PNG internally uses zlib, so the optimize flag in that case effectively instructs the encoder to use gzip -9 instead of gzip -6.

This is an easy change to make but it turns out that it is not a silver bullet, reducing file size by just a few percent.

Progressive JPEG

When saving an image as a JPEG, there are a few different types you can choose from:

  • Baseline JPEG images load from top to bottom.
  • Progressive JPEG images load from more blurry to less blurry. The progressive option can easily be enabled in Pillow (progressive=True). As a result, there is a perceived performance increase (that is, it’s easier to notice when an image is partially absent than it is to tell it’s not fully sharp).

Additionally, the way progressive files are packed generally results in a small reduction to file size. As more fully explained by the Wikipedia article, JPEG format uses a zigzag pattern over the 8×8 blocks of pixels to do entropy coding. When the values of those blocks of pixels are unpacked and laid out in order, you generally have non-zero numbers first and then sequences of 0s, with that pattern repeating and interleaved for each 8×8 block in the image. With progressive encoding, the order of the unwound pixel blocks changes. The higher value numbers for each block come first in the file, (which gives the earliest scans of a progressive image its distinct blockiness), and the longer spans of small numbers, including more 0s, that add the finer details are towards the end. This reordering of the image data doesn’t change the image itself, but does increase the number of 0s that might be in a row (which can be more easily compressed).

Comparison with a delicious user-contributed image of a donut (click for larger):

(left) A mock of how a baseline JPEG renders.

(left) A mock of how a baseline JPEG renders.

(right) A mock of how a progressive JPEG renders.

(right) A mock of how a progressive JPEG renders.

Changes to Application Photo Logic

Large PNG Detection

Yelp targets two image formats for serving user-generated content – JPEG and PNG. JPEG is a great format for photos but generally struggles with high-contrast design content (like logos). By contrast, PNG is fully-lossless, so great for graphics but too large for photos where small distortions are not visible. In the cases where users upload PNGs that are actually photographs, we can save a lot of space if we identify these files and save them as JPEG instead. Some common sources of PNG photos on Yelp are screenshots taken by mobile devices and apps that modify photos to add effects or borders.

(left) A typical composited PNG upload with logo and border. (right) A typical PNG upload from a screenshot.

(left) A typical composited PNG upload with logo and border. (right) A typical PNG upload from a screenshot.

We wanted to reduce the number of these unnecessary PNGs, but it was important to avoid overreaching and changing format or degrading quality of logos, graphics, etc. How can we tell if something is a photo? From the pixels?

Using an experimental sample of 2,500 images, we found that a combination of file size and unique pixels worked well to detect photos. We generate a candidate thumbnail image at our largest resolution and see if the output PNG file is larger than 300KiB. If it is, we’ll also check the image contents to see if there are over 2^16 unique colors (Yelp converts RGBA image uploads to RGB, but if we didn’t, we would check that too).

In the experimental dataset, these hand-tuned thresholds to define “bigness” captured 88% of the possible file size savings (i.e. our expected file size savings if we were to convert all of the images) without any false-positives of graphics being converted.

Dynamic JPEG Quality

The first and most well-known way to reduce the size of JPEG files is a setting called quality. Many applications capable of saving to the JPEG format specify quality as a number.

Quality is somewhat of an abstraction. In fact, there are separate qualities for each of the color channels of a JPEG image. Quality levels 0 – 100 map to different quantization tables for the color channels, determining how much data is lost (usually high frequency). Quantization in the signal domain is the one step in the JPEG encoding process that loses information.

The simplest way to reduce file size is to reduce the quality of the image, introducing more noise. Not every image loses the same amount of information at a given quality level though.

We can dynamically choose a quality setting which is optimized for each image, finding an ideal balance between quality and size. There are two ways to do this:

  • Bottom-up: These are algorithms that generate tuned quantization tables by processing the image at the 8×8 pixel block level. They calculate both how much theoretical quality was lost and how that lost data either amplifies or cancels out to be more or less visible to the human eye.
  • Top-down: These are algorithms that compare an entire image against an original version of itself and detect how much information was lost. By iteratively generating candidate images with different quality settings, we can choose the one that meets a minimum evaluated level by whichever evaluation algorithm we choose.

We evaluated a bottom-up algorithm, which in our experience did not yield suitable results at the higher end of the quality range we wanted to use (though it seems like it may still have potential in the mid-range of image qualities, where an encoder can begin to be more adventurous with the bytes it discards). Many of the scholarly papers on this strategy were published in the early 90s when computing power was at a premium and took shortcuts that option B addresses, such as not evaluating interactions across blocks.

So we took the second approach: use a bisection algorithm to generate candidate images at different quality levels, and evaluate each candidate image’s drop in quality by calculating its structural similarity metric (SSIM) using pyssim, until that value is at a configurable but static threshold. This enables us to selectively lower the average file size (and average quality) only for images which were above a perceivable decrease to begin with.

In the below chart, we plot the SSIM values of 2500 images regenerated via 3 different quality approaches.

  1. The original images made by the current approach at quality = 85 are plotted as the blue line.
  2. An alternative approach to lowering file size, changing quality = 80, is plotted as the red line.
  3. And finally, the approach we ended up using, dynamic quality, SSIM 80-85, in orange, chooses a quality for the image in the range 80 to 85 (inclusive) based on meeting or exceeding an SSIM ratio: a pre-computed static value that made the transition occur somewhere in the middle of the images range. This lets us lower the average file size without lowering the quality of our worst-quality images.

SSIMs of 2500 images with 3 different quality strategies.

SSIMs of 2500 images with 3 different quality strategies.

SSIM?

There are quite a few image quality algorithms that try to mimic the human vision system. We’ve evaluated many of these and think that SSIM, while older, is most suitable for this iterative optimization based on a few characteristics:

  1. Sensitive to JPEG quantization error
  2. Fast, simple algorithm
  3. Can be computed on PIL native image objects without converting images to PNG and passing them to CLI applications (see #2)

Example Code for Dynamic Quality:

There are a few other blog posts about this technique, here is one by Colt Mcanlis. And as we go to press, Etsy has published one here! High five, faster internet!

Changes to JPEG Encoder

Mozjpeg

Mozjpeg is an open-source fork of libjpeg-turbo, which trades execution time for file size. This approach meshes well with the offline batch approach to regenerating images. With the investment of about 3-5x more time than libjpeg-turbo, a few more expensive algorithms make images smaller!

One of mozjpeg’s differentiators is the use of an alternative quantization table. As mentioned above, quality is an abstraction of the quantization tables used for each color channel. All signs point to the default JPEG quantization tables as being pretty easy to beat. In the words of the JPEG spec:

These tables are provided as examples only and are not necessarily suitable for any particular application.

So naturally, it shouldn’t surprise you to learn that these tables are the default used by most encoder implementations… 🤔🤔🤔

Mozjpeg has gone through the trouble of benchmarking alternative tables for us, and uses the best performing general-purpose alternative for images it creates.

Mozjpeg + Pillow

Most Linux distributions have libjpeg installed by default. So using mozjpeg under Pillow doesn’t work by default, but configuring it isn’t terribly difficult either. When you build mozjpeg, use the --with-jpeg8 flag and make sure it can be linked by Pillow will find it. If you’re using Docker, you might have a Dockerfile like:

That’s it! Build it and you’ll be able to use Pillow backed by mozjpeg within your normal images workflow.

Impact

How much did each of those improvements matter for us? We started this research by randomly sampling 2,500 of Yelp’s business photos to put through our processing pipeline and measure the impact on file size.

  1. Changes to Pillow settings were responsible for about 4.5% of the savings
  2. Large PNG detection was responsible for about 6.2% of the savings
  3. Dynamic Quality was responsible for about 4.5% of the savings
  4. Switching to the mozjpeg encoder was responsible for about 13.8% of the savings

This adds up to an average image file size reduction of around 30%, which we applied to our largest and most common image resolutions, making the website faster for users and saving terabytes a day in data transfer. As measured at the CDN:

Average filesize over time, as measured from the CDN (combined with non-image static content).

Average filesize over time, as measured from the CDN (combined with non-image static content).

What we didn’t do

This section is intended to introduce a few other common improvements that you might be able to make, that either weren’t relevant to Yelp due to defaults chosen by our tooling, or tradeoffs we chose not to make.

Subsampling

Subsampling is a major factor in determining both quality and file size for web images. Longer descriptions of subsampling can be found online, but suffice it to say for this blog post that we were already subsampling at 4:1:1 (which is Pillow’s default when nothing else is specified) so we weren’t able to realize any further savings here.

Lossy PNG encoding

After learning what we did about PNGs, choosing to preserve some of them as PNG but with a lossy encoder like pngmini could have made sense, but we chose to resave them as JPEG instead. This is an alternate option with reasonable results, 72-85% file size savings over unmodified PNGs according to the author.

Dynamic content types

Support for more modern content types like WebP or JPEG2k is certainly on our radar. Even once that hypothetical project ships, there will be a long-tail of users requesting these now-optimized JPEG/PNG images which will continue to make this effort well worth it.

SVG

We use SVG in many places on our website, like the static assets created by our designers that go into our styleguide. While this format and optimization tools like svgo are useful to reduce website page weight, it isn’t related to what we did here.

Vendor Magic

There are too many providers to list that offer image delivery / resizing / cropping / transcoding as a service. Including open-source thumbor. Maybe this is the easiest way to support responsive images, dynamic content types and remain on the cutting edge for us in the future. For now our solution remains self-contained.

Further Reading

Two books listed here absolutely stand on their own outside the context of the post, and are highly recommended as further reading on the subject.

来源: Making Photos Smaller Without Quality Loss

好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)
Loading...
303 views
全栈技术构建web应用

全栈技术构建web应用

如何构建一个中型的 web 应用(全栈技术)

计算机界的轮子已经如此之多,我的观点是技术不转化成应用是没有价值的,本文主要挑选了一些技术,复用一些优秀的轮子,用最小的成本构建自己的 web 应用。

主要内容

界面设计

boostrap twitter 出品的响应式框架,可以快速构建优美的前端界面

material-design-lite Google 出品的Material Design 风格的前端框架

前端库 /框架

jquery 方便快捷地操纵 dom

前端构建工具

yog2 百度出品的前端构建工具,将fis3express结合在一起

webpack 当下最火的前端构建工具

后端语言

nodejavascript 写后端应用

后端框架

express node.js 官方推荐的 web 框架

koa express 原始团队出品,口碑很好,我并没有在实际项目中使用过

数据库

mysql 全球最流行的开源数据库,各大互联网公司都在大范围使用

mongo 时下最流行的 nosql 数据库,经过几年的发展已经很稳定了

数据库驱动

knex mysql 的 node.js 版的 sql 构建库,需要搭配mysql 的 node.js 驱动 使用

mongoose mongo 的 node.js 版的数据库驱动

代码部署

ansible 基于 ssh 的自动化部署工具,我还在摸索的阶段

云主机

ucloud 口碑不错的云服务商

阿里云 阿里巴巴旗下的云服务,号称国内最大

百度云 推荐一下自家公司的产品

CDN

七牛云 老牌云存储服务商,有免费额度

原文地址: https://github.com/Arnoldnuo/how-to-make-web-app

 

好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)
Loading...
309 views
让大家在InfoQ上听见你的声音

让大家在InfoQ上听见你的声音

投稿

InfoQ由社区推动,这里的内容源自像你一样的专业技术人。对于社区而言,你是最重要的。

你是否热衷于软件开发?你是否热衷于与他人分享你的知识与经验?InfoQ一直在寻找优秀的作者与热情的编辑加入我们的内容贡献队伍中,跟社区里的其他成员一起,促进软件开发领域知识与创新的传播。

为InfoQ贡献内容的方式有两种,联系方式均为editors@cn.infoq.com:

深度文章投稿

深度文章不是简单的How-to。
作者需要对某个领域有十分深入的理解,
或非常丰富的实战经验。
详情见InfoQ中文站投稿须知

成为社区编辑

InfoQ社区编辑能够在自己关注的领域挖掘热点新闻和人物,并直观的将事实表述出来分享给读者。社区编辑分为翻译组、原创新闻组、专家审校组等,详见:
InfoQ编辑团队加盟指南

InfoQ编辑的核心价值观

InfoQ.com是由实践者驱动的社区媒体,我们的目标是: 推进和传播软件开发领域的知识和创新。 作为编辑兼实践者,我们参与成员驱动的编辑过程,向着这个共同目标努力前进,具体工作方式包括:翻译和撰写新闻,并以文章、采访、录制技术大会视频等多种形式分享知识。

我们恪守并践行下列核心价值观:

做信息的罗宾汉。我们的主要职能就是从少数拥有信息的精英那里寻找信息,并将其发布给广大群众。当我们发现一些很棒的信息,并且认为值得让整个社区知道时,我们应该将发布它视为我们必须承担的职责。

做最好的,而不是最快的。我们不是发布突发新闻的网站,当某件事情发生后,过上几天再发布也是可以的,只要我们收集到足够的材料,并能提供更深度的内容。

做推进者,而不是领导者。我们会编写各种内容,展现社区中的新人老将、各种活动、各种想法,以此推进社区的成长,而不是关注我们自己。作为推进者,我们与社区协作,产生有价值的内容,而不是只考虑推行我们的想法;我们能够而且应该辅助现有的各种业界活动和趋势。

提供可信赖的内容。我们的内容将会不带偏见,不会偏向个人或是厂商,除非能够明确表明属于某种“意见”。用户可以期望InfoQ的内容有所裨益,而且源于事实。我们努力坚持媒体工作的原则,并体现在我们的新闻写作活动中,同时也要认识到:我们不是全职的权威媒体工作者,而是试图做正确之事的实践者。

来源: 让大家在InfoQ上听见你的声音

好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)
Loading...
285 views
好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)
Loading...
342 views
Keynote是什么

Keynote是什么

其实这个问题,我是用来给自己答疑解惑的。

模糊记得,最初,Keynote是苹果公司推出的一款PPT(幻灯片)演示软件。它运行于MacOS系统。不仅支持几乎所有的图片字体,还可以使界面和设计更图形化;借助于MacOS内置的Quartz等图形技术,制作的幻灯片也更容易夺人眼球;另外,keynote还有真三维转换,幻灯片在切换的时候用户便可以选择旋转立方体等多种方式。

后来,Steve Jobs – 史蒂夫·乔布斯对它的使用,使得它名声度越来越高。Jobs在为Macworld Conference and Expo,和其它苹果公司的活动中,在发表主题演讲时,都是使用自家的keynote。Jobs的特立独行、专注武断,以及他的对 Apple一系列产片的[美]的发掘和展现,使得苹果势头凶猛、营收累累。同时也直接、间接的带动了Keynote的传播。

Apple笔记本 – Mac Air、Mac Pro,等等,已经成为互联网界产品经理、美工、IOS 必备;随身带Mac,随手拿iPhone,随心Keynote展示PPT;已成各人zhuangbility利器。

所以,现在,每年国内国外有许许多多的互联网大会、Google大会、 Apple大会、特斯拉大会、Android大会、AWS大会,直接就以Keynote代替PPT了;就是说:Keynote就是PPT,PPT就是Keynote;你要是用PowerPoint,你都不好意思和人打招呼!

如图,Keynote已势不可挡。

-end-

好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)
Loading...
351 views
使用技术手段提高网站排名

使用技术手段提高网站排名

Q:

请问已经参与排名的文章怎么提高排名,我新建的网站,有一篇文章原来排名是第二,但是现在降落到第6了(360排名,文章是360的排名由于是新站刚成立十几天百度还没有收录),但是这篇文章每天能为我拉一两个下线,我享受到了一点点甜头有点小激动,但是又苦于排名不能上去,有什么方法可以让这篇文章的排名提高一点吗?我已经花很多时间在这篇文章的写作上面下功夫了,内容也行,有针对性!~网站不放出来,是因为不想被别有用心之人搞~老牛能否指导一二!谢谢

A:

做关键词外链 实在不行就做站群 多做些站 内容不要重复 但是内容要有相关性 做的外链都指向主站 但是子站之间不要相互链接

以上的方法可以试试

Q:

一个月800美元
大概每天多少IP,多少UV

A:

优化的好ip 8k UV 2.4w
不好的话加30%量

A:

关键还是要文章质量高!

-end-

 

好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)
Loading...
334 views
好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)
Loading...
337 views
页面停留3分钟效果

页面停留3分钟效果

如果实现这种效果,即,

当当前页面3分钟内无任何 鼠标、键盘操作时,即弹出dvi页面,

提示“本网页已闲置超过3分钟,按键盘任意键或点击空白处,即可回到网页”;

同时显示多条相关信息。

好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)
Loading...
86 views
好烂啊有点差凑合看看还不错很精彩 (No Ratings Yet)
Loading...
148 views
搜狗实验室技术报告

搜狗实验室技术报告

无意发现搜狗实验室,和她的技术报告,不错,良心企业,赞。

 

C10K问题—epoll简介

简介:编写连接数巨大的高负载服务器程序时,经典的多线程模式和select模式都不再适用。应当抛弃它们,采用epoll/kqueue/dev_poll来捕获I/O事件。

乱序优化与GCC的Bug

简介:乱序优化是现代编译器非常重要的特性,本文介绍了什么是乱序优化,以及由此引发的一个gcc bug,希望引起各位开发者的注意。

轻量级AJAX库

简介:Ajax作为一个非常常用的功能,在页面中的作用越来越举足轻重。而对于浏览器的支持,易用性,lib本身大小就成为了衡量一个lib的指标。作者重写了ajax library用来取代现有的prototype.js。

java的nio技术实现的异步连接池

简介:本文重点讲解异步连接池的诞生背景和使用方法,同时介绍 java nio技术的基础知识。

XSS跨站脚本攻击及防范

简介:XSS(Cross Site Script)跨站脚本攻击。它指的是恶意攻击者往Web页面里插入恶意html代码,当用户浏览该页之时,嵌入其中Web里面的html代码会被执行,从而达到恶意用户的特殊目的。本文介绍了该攻击方式,并给出了一些防范措施。

实现跨域访问的Ajaj

简介:Ajaj即Asynchronous JavaScript And JavaScript_Text。 它跟Ajax(具体的详细的介绍请参见Ajax: A New Approach to Web Applications。)类似,Ajaj也是在不刷新页面的情况下,和server进行交互,并且可以实现跨域交互。

使用hudson搭建daily build系统

简介:每日构建,Daily Build是指周期性地(每天)、全自动地、完整地对整个项目的代码进行编译和集成。本文以miscsearch组搭建hudson服务器的实践过程为例,介绍了daily build系统的搭建过程。

Bigmem: 在32bit下利用超过4G内存

简介:32bit应用程序由于寻址空间的限制,无法直接使用4G以上的物理内存, 这对一些性能要求高,内存开销大的应用程序而言是很大的限制. 本文介绍了一种在32bit下利用超过4G的内存的方法和相应的实现。

C10K与高性能程序续篇

简介:本文是卷1-1文章“C10K问题—epoll简介”的续篇,介绍了如何利用流水线和一些锁的技巧提高服务器吞吐量,以及新兴的Lock Free技术。

基于泛型技术的工程优化方法

简介:本文从对系统在工程层面上优化的角度出发,引入泛型程序设计技术,着重讲述了policy classes和type_selector两种方法对于系统工程设计与性能上的优化。其中在对policy classes的介绍中,将其与C++ virtual function做了详细分析与对比;在对type_selector技术的阐述中,介绍了如何实现源代码层面上的可配置性。本文所采用的优化方法是在程序设计层面上展开的,充分利用了C++语言的特性与灵巧的设计来达到优化目的的。本文在系统时空复杂度优化与系统设计方法两方面并重,并在最后给出了一个关于可持久化的对象与关系数据库映射(O/R Mapping)的迷你框架的实现,来阐明在实际项目中这些技术与方法的综合应用。

Read More Read More

跳至工具栏