Ticket #331 (new defect)

Opened 3 years ago

Last modified 13 months ago

Detecting text encoding tmprsrc on upload

Reported by: svr Owned by: mpo
Priority: minor Milestone: 0.5
Component: < Upload Control Version: trunk
Keywords: utf-8 upload check text encoding Cc: kauri-discuss@…

Description

Hello,

I've encountered some issues while working with the upload control, more specific the tmp rsrc. Users can upload CSV files for processing, but here I don't have any control on what charset is being used (standard it's utf-8, but ms excel uses
iso-5589-1).
So I thought I could use the tmp-rsrc to find out what charset the file exactly is, but apparently it says it's always utf-8. So when I have to process a iso-5589-1 file, I get strange characters...

When I look at the source code (UploadNewDataResource.java), I see a todo about detecting encodings. Default encoding is being used.

//TODO how can we be sure of the character-set of uploaded text files?
// needs further investigation and test-cases. (there seems no way to really know)
// might eventually need some service that actually does detection of real charsets
uploadRepresentation.setCharacterSet(CharacterSet.DEFAULT);

At the moment I've implemented a simple check at application level whether or not a file has charset utf-8 (by checking the BOM), as I'm quite sure utf-8 and iso-5589-1 are the only ones I can encounter in my case.

More advanced checks are needed at the kauri level though.

Steven

Attachments

4° TRIMESTRE 2011 PRI12T16_Riass_vita.TXT (77.1 KB) - added by anonymous 13 months ago.
riass

Change History

comment:1 Changed 3 years ago by jgou

  • Milestone set to 0.4.1

comment:2 Changed 3 years ago by jgou

  • Milestone changed from 0.4.1 to 0.5

Changed 13 months ago by anonymous

riass

comment:3 Changed 13 months ago by anonymous

please let me know the encoding of this file

Note: See TracTickets for help on using tickets.