Friday, May 4, 2007

UTF-8 with signature?

Few days ago I was at my customer and we witnessed a wired behavior of excel.

We wanted to display data in excel from the web, so that when you click a link excel will open and display some data in it (not within explorer).

Doing this is pretty simple, all we have to do is create a data grid control, bind it to a data source and let it render.

When we want it to open to excel we have to replace the "pre-render" method with this simple code:
Page.Response.Clear();//clear other HTML form response
Page.Response.AddHeader("Content-Disposition", "attachment;filename=StudentsList.xls");//to open in excel
Page.Response.Charset = "65001";//for UTF-8
myDG.RenderControl(writer);//render grid HTML
Page.Response.End();//End response here.

This works great for most cases.

My client had to display some non-English characters that were all UTF8 encoded. When saving the result HMTL in notepad using UTF8 excel was able to display the special chars ok, but when using the code sample above - we got some wrong data.

After googling around for a while i managed to understand that excel must use UTF8 with signature text and i had to add a signature to it.

So - how do I signature my file as UTF8???

Some more googling allowed me to learn that all I needed to do is add these bytes to the start of the file:
0xEF, 0xBB, 0xBF

Using C#, here is the complete and correct way to do this:

Page.Response.Clear();
Page.Response.ContentType = "application/vnd.ms-excel";
Page.Response.AddHeader("Content-Disposition", "attachment;filename=StudentsList.xls");
Page.Response.Charset = "65001";
byte[] b = new byte[] { 0xEF, 0xBB, 0xBF };
Page.Response.BinaryWrite(b);
myDG.RenderControl(writer);
Page.Response.End();

Now everything should be rendered okay and my file is saved with UTF8 with signature!!!

Hooray!

Well, hope this helps, since I didn't manage to find any document that explains this I thought it might help if I publish one...

Peace, Shai.

14 comments:

Anonymous said...

Hi,

This article really helped me, when creating a .htm file on the fly in a web application.

Thanx a lot!

Anonymous said...

I'm glad you found this googling. I tried for a while and ended up stumbling on the solution myself after having saved the file as utf-8 in notepad and comparing the two.

For some reason excel like my utf saved in notepad file but not the original and after doing a hex comparison, I noticed the three signature bytes and started adding them in the original output.

All works great now!

Jan Aagaard said...

Thanks a lot for the tip.

warlockPT said...

Thanks a lot for the info... I was getting desperate regarding this problem and that 3 byte sig solved my problem

Unknown said...

Can you tell me how to do that in classic ASP. I have the same problem there.

Plese help. You can write comment or send me to zvonimir@idea.hr

whatispunk said...

Awesome! I've been beating my head on the desk for hours about this one.

Faszikam said...

Please excuse my ignorance, but how do I "run" the C# code on my Excel spreadsheet, so that it will give me Unicode in the csv file I export to? Or did tjodalv get an answer in vba that I could use???

Unknown said...
This comment has been removed by the author.
Unknown said...

I manage to solve that problem in classic ASP. We have to convert string to binary object. ASP can work with binary data thanks to ADO Stream.

This is example how to do that:

[code]
' This function convert text to binary object
Function StringToBinary(Text, CharSet)

Const adTypeText = 2
Const adTypeBinary = 1

'Create Stream object
Dim BinaryStream 'As New Stream
Set BinaryStream = CreateObject("ADODB.Stream")

'Specify stream type - we want To save text/string data.
BinaryStream.Type = adTypeText

'Specify charset For the source text (unicode) data.
If Len(CharSet) > 0 Then
BinaryStream.CharSet = CharSet
Else
BinaryStream.CharSet = "UTF-8" ' it can be us-ascii, utf-8 ...
End If

'Open the stream And write text/string data To the object
BinaryStream.Open
BinaryStream.WriteText Text

'Change stream type To binary
BinaryStream.Position = 0
BinaryStream.Type = adTypeBinary

'Ignore first two bytes - sign of
BinaryStream.Position = 0

'Open the stream And get binary data from the object
StringToBinary = BinaryStream.Read

End Function


' This is you string
csv = "id;name;something"

Response.ContentType = "text/csv"
Response.AddHeader "Content-Disposition", "attachment; filename=filename.csv"

binary_object = StringToBinary(csv, "UTF-8")
Response.BinaryWrite(binary_object)

Response.End
[/code]

I hope that helps. It works for me.

Anonymous said...

Many thanks for this. Helped me enormously exporting csv's from PHP.

For any PHP'ers who come this way, you need to output the following at the very start of your file:

echo chr(0xEF).chr(0xBB).chr(0xBF);

Thanks again :-)

Anonymous said...

Thanks Shai...
It helped me very much.
Roi K :)

Anonymous said...

PERFECT! Thanks!

Anonymous said...

Hi you can also do this :

StreamWriter sw = new StreamWriter(fileNameAndpath, true, Encoding.UTF8);

sw.Write(sb.ToString());

etc...

It will add the three byte order mark automaticaly in the beginning of the file.

Grégoire Lafortune

Shai Petel said...

Hi Grégoire,

I think you did not understand the problem, or I do not understand you answer.

The problem here is with writing it to the page response stream object, not my own StreamWriter.

Am I missing something?

Thanks.