mediatype什么意思解决下载的附件名乱码

2022-07-21 2:10:06 网络知识 官方管理员

167|0条评论

Web开发过程中，相信大家都遇到过附件下载的场景，其中，各浏览器下载后的文件名中文乱码问题或许一度让你苦恼不已。

网上搜索一下，大部分都是通过RequestHeaders中的UserAgent字段来判断浏览器类型，根据不同的浏览器做不同的处理，类似下面的代码：

//MicroSoftBrowserif(agent.contains("msie")||agent.contains("trident")||agent.contains("edge")){//filename特殊处理}//firefoxelseif(agent.contains("firefox")){//filename特殊处理}//safarielseif(agent.contains("safari")){//filename特殊处理}//Chromeelseif(agent.contains("chrome")){//filename特殊处理}//其他else{//filename特殊处理}//最后把特殊处理后的文件名放到head里response.setHeader("Content-Disposition","attachment;fileName="filename);

不过，这样的代码看起来很魔幻，为什么每个浏览器的处理方式都不一样？难道每次新出一个浏览器都要做兼容吗？就没有一个统一标准来约束一下这帮浏览器吗？

带着这个疑惑，我翻阅了RFC文档，最终得出了一个优雅的解决方案：

//percentEncodedFileName为百分号编码后的文件名response.setHeader("Content-disposition","attachment;filename="percentEncodedFileName";filename*=utf-8''"percentEncodedFileName);

经过测试，这段响应头可以兼容市面上所有主流浏览器，由于是HTTP协议范畴，所以语言无关。只要按这个规则设置响应头，就能一劳永逸地解决恼人的附件名中文乱码问题。

接下来课代表带大家抽丝剥茧，通过阅读RFC文档，还原一下这个响应头的产出过程。

1.Content-Disposition

一切要从RFC6266开始，在这份文档中，介绍了Content-Disposition响应头，其实它并不属于HTTP标准，但是因为使用广泛，所以在该文档中进行了约束。它的语法格式如下：

content-disposition="Content-Disposition"":"disposition-type*(";"disposition-parm)disposition-type="inline"|"attachment"|disp-ext-type;case-insensitivedisp-ext-type=tokendisposition-parm=filename-parm|disp-ext-parmfilename-parm="filename""="value|"filename*""="ext-value

其中的disposition-type有两种：

inline代表默认处理，一般会在页面展示
attachment代表应该被保存到本地，需要配合设置filename或filename*

注意到disposition-parm中的filename和filename*，文档规定：这里的信息可以用于保存的文件名。

它俩的区别在于，filename的value不进行编码，而filename*遵从RFC5987中定义的编码规则：

ProducersMUSTuseeitherthe"UTF-8"([RFC3629])orthe"ISO-8859-1"([ISO-8859-1])characterset.

由于filename*是后来才定义的，许多老的浏览器并不支持，所以文档规定，当二者同时出现在头字段中时，需要采用filename*，忽略filename。

至此，响应头的骨架已经呼之欲出了，摘录[RFC6266]中的示例如下：

Content-Disposition:attachment;filename="EUROrates";filename*=utf-8''€rates

这里对filename*=utf-8''€rates做一下说明，这个写法乍一看可能会觉得很奇怪，它其实是用单引号作为分隔符，将等号右边分成了三部分：第一部分是字符集(utf-8)，中间部分是语言(未填写)，最后的€rates代表了实际值。对于这部分的组成，在RFC2231.section4中有详细说明：

Asinglequoteisusedtoseparatethecharacterset,language,andactualvalueinformationintheparametervaluestring,andanpercentsignisusedtoflagoctetsencodedinhexadecimal.

2.PercentEncode

PercentEncode又叫Percent-encoding或URLencoding.

正如前文所述，filename*遵守的是[RFC5987]中定义的编码规则，在[RFC5987]3.2中定义了必须支持的字符集：

recipientsimplementingthisspecificationMUSTsupportthecharactersets"ISO-8859-1"and"UTF-8".

并且在[RFC5987]3.2.1规定，百分号编码遵从RFC3986.section2.1中的定义，摘录如下：

Apercent-encodingmechanismisusedtorepresentadataoctetinacomponentwhenthatoctet'scorrespondingcharacterisoutsidetheallowedsetorisbeingusedasadelimiterof,orwithin,thecomponent.Apercent-encodedoctetisencodedasacharactertriplet,consistingofthepercentcharacter"%"followedbythetwohexadecimaldigitsrepresentingthatoctet'snumericvalue.Forexample,""isthepercent-encodingforthebinaryoctet"00100000"(ABNF:%x20),whichinUS-ASCIIcorrespondstothespacecharacter(SP).Section2.4describeswhenpercent-encodinganddecodingisapplied.

注意了，[RFC3986]明确规定了空格会被百分号编码为

而在另一份文档RFC1866.Section8.2.1Theform-urlencodedMediaType中却规定：

Thedefaultencodingforallformsis`application/x-www-form-urlencoded'.Aformdatasetisrepresentedinthismediatypeasfollows:1.Theformfieldnamesandvaluesareescaped:spacecharactersarereplacedby`',andthenreservedcharactersareescapedasper[URL]

这里要求
application/x-www-form-urlencoded类型的消息中，空格要被替换为,其他字符按照[URL]中的定义来转义，其中的[URL]指向的是RFC1738而它的修订版中和URL有关的最新文档恰恰就是[RFC3986]

这也就是为什么很多文档中描述空格(whitespace)的百分号编码结果都是或，如：

w3schools:URLencodingnormallyreplacesaspacewithaplus()signorwith.

MDN:Dependingonthecontext,thecharacter''istranslatedtoa''(likeinthepercent-encodingversionusedinanapplication/x-www-form-urlencodedmessage),orin''likeonURLs.

那么问题来了，开发过程中，对于空格符的百分号编码我们应该怎么处理？

课代表建议大家遵循最新文档，因为[RFC1866]中定义的情况仅适用于
application/x-www-form-urlencoded类型，就百分号编码的定义来说，我们应该以[RFC3986]为准，所以，任何需要百分号编码的地方，都应该将空格符百分号编码为，stackoverflow上也有支持此观点的答案：Whentoencodespacetoplus()or?

3.代码实践

有了理论基础，代码写起来就水到渠成了，直接上代码：

@GetMapping("/downloadFile")publicStringdownload(StringserverFileName,HttpServletRequestrequest,HttpServletResponseresponse)throwsIOException{request.setCharacterEncoding("utf-8");response.setContentType("application/octet-stream");StringclientFileName=fileService.getClientFileName(serverFileName);//对真实文件名进行百分号编码StringpercentEncodedFileName=URLEncoder.encode(clientFileName,"utf-8").replaceAll("\\","");//组装contentDisposition的值StringBuildercontentDispositionValue=newStringBuilder();contentDispositionValue.append("attachment;filename=").append(percentEncodedFileName).append(";").append("filename*=").append("utf-8''").append(percentEncodedFileName);response.setHeader("Content-disposition",contentDispositionValue.toString());//将文件流写到response中try(InputStreaminputStream=fileService.getInputStream(serverFileName);OutputStreamoutputStream=response.getOutputStream()){IOUtils.copy(inputStream,outputStream);}return"OK!";}

代码很简单，其中有两点需要说明一下：

URLEncoder.encode(clientFileName,"utf-8")方法之后，为什么还要.replaceAll("\\","")。正如前文所述，我们已经明确，任何需要百分号编码的地方，都应该把空格符编码为，而URLEncoder这个类的说明上明确标注其会将空格符转换为:Thespacecharacter""isconvertedintoaplussign"{@code}".其实这并不怪JDK，因为它的备注里说明了其遵循的是application/x-www-form-urlencoded(PHP中也有这么一个函数，也是这么个套路)Translatesastringinto{@codeapplication/x-www-form-urlencoded}formatusingaspecificencodingscheme.Thismethodusesthe所以这里我们用.replaceAll("\\","")把号处理一下，使其完全符合[RFC3986]的百分号编码规范。这里为了方便说明问题，把所有操作都展现出来了。当然，你完全可以自己实现一个PercentEncoder类，丰俭由人。
[RFC6266]标准中filename=的value是不需要编码的，这里的filename=后面的value为什么要百分号编码？回顾[RFC6266]文档，filename和filename*同时出现时取后者，浏览器太老不支持新标准时取前者。目前主流的浏览器都采用自升级策略，所以大部分都支持新标准------除了老版本IE。老版本的IE对value的处理策略是进行百分号解码并使用。所以这里专门把filename=的value进行百分号编码，用来兼容老版本IE。PS：课代表实测IE11及Edge已经支持新标准了。

4.浏览器测试

根据下图statcounter统计的2019年中国市场浏览器占有率，课代表设计了一个包含中文，英文，空格的文件名下载-downtest.txt用来测试

测试结果：

Browser	Version	pass
Chrome	84.0.4147.125	true
UC	V6.2.4098.3	true
Safari	13.1.2	true
QQBrowser	10.6.1(4208)	true
IE	7-11	true
Firefox	79.0	true
Edge	44.18362.449.0	true
360安全浏览器12	12.2.1.362.0	true
Edge(chromium)	84.0.522.59	true