1

I have a big log file and i need to cut a part of it and paste it in the end of the same file... so i created a script that looks for the chunks of code i need (sed) create a temp file ,paste it , then go to original file and delete that same text. then i (cat) both files. Problem is my files are all in utf-8, but the temp file is created in ANSI(probably because it has only "normal" characters) and when i join both files the original file is changed to ANSI, causing a mess in some text. I changed my locals, but they where correct,

en_GB.ISO-8859-1... up-to-date
  en_GB.UTF-8... up-to-date

and tried a endless nr of combinations in my scripts, but result is always the same. I did a this simple test

echo "test" >/tmp/test.txt 
echo "ção" >/tmp/test2.txt 

then i did

file/tmp/test.txt ------>it says ASCII
file /tmp/test2.txt ---->it says UTF-8 Unicode text

when i edit the file in edit-plus and check the file encoding and it says ANSI to 1st and UTF-8 to 2nd. but if i cat them together the output is ok (utf-8) no matter the order i chose... but using my big log file (already in utf-8) and cat it together with the temp file the result is always the ANSI.

I thought there might be some kind of a autoconfiguration of the encoding if no special chars are present, but i can't understand why it changes to ANSI or ASCII when i join them. I'm using a server with Ubuntu 14.04 Server, and accessing with Putty on windows machine.

  • Could you add an example to your post that does not work? – PerlDuck Oct 06 '18 at 16:47
  • Maybe https://stackoverflow.com/q/27072558/5830574 – PerlDuck Oct 06 '18 at 16:51
  • I'm no specialist on encoding, but file identified as ASCII is simply when in contains characters only in ASCII range. But bitwise characters should be same width to fit UTF-8 width, so in itself it should not be a problem. – Sergiy Kolodyazhnyy Oct 06 '18 at 17:04
  • @ PerlDuck as crazy as it seems i tryed that too (so I thought) and it didn't worked... But i double checked it before answering you and it just worked fine... Almost 3 days going nuts.... Thanks 4 the help. – Lcross Portugal Oct 06 '18 at 17:07
  • 1
    The file command reads only a limited number of bytes in order to determine the type of the file. If the initial bytes are only ASCII, then the file type reported will be ASCII even if UTF-8 characters show up later in the file. Linux file systems do not differentiate between ASCII and UTF-8 files. – doneal24 Oct 06 '18 at 17:38

0 Answers0