Ticket #331 (new defect)
Detecting text encoding tmprsrc on upload
|Reported by:||svr||Owned by:||mpo|
|Component:||< Upload Control||Version:||trunk|
|Keywords:||utf-8 upload check text encoding||Cc:||kauri-discuss@…|
I've encountered some issues while working with the upload control, more specific the tmp rsrc. Users can upload CSV files for processing, but here I don't have any control on what charset is being used (standard it's utf-8, but ms excel uses
So I thought I could use the tmp-rsrc to find out what charset the file exactly is, but apparently it says it's always utf-8. So when I have to process a iso-5589-1 file, I get strange characters...
When I look at the source code (UploadNewDataResource.java), I see a todo about detecting encodings. Default encoding is being used.
//TODO how can we be sure of the character-set of uploaded text files? // needs further investigation and test-cases. (there seems no way to really know) // might eventually need some service that actually does detection of real charsets uploadRepresentation.setCharacterSet(CharacterSet.DEFAULT);
At the moment I've implemented a simple check at application level whether or not a file has charset utf-8 (by checking the BOM), as I'm quite sure utf-8 and iso-5589-1 are the only ones I can encounter in my case.
More advanced checks are needed at the kauri level though.