1

I wan to remove Control Return and merge lines in one text file and limit number of characters

input.txt containing:

comment 1
comment 2 
...
comment n 

output.txt should one strings:

comment 1 comment 2 ... commnet n

BUT the ouput.txt should be limited to i.e. 32 of characters:

comment 1 comment 2 comment 3 co

Can I use sed, awk tr or somthing else?

George Udosen
  • 37,674

4 Answers4

1
head -c 32 input.txt | tr '\n' ' ' > output.txt
  • head -c 32 discards all but the first 32 bytes.

  • tr '\n' ' ' replaces all newline characters with space characters.

If you want to limit characters instead of bytes in case of multi-byte character encodings you can use grep instead:

tr '\n' ' ' < input.txt | grep -oEe '^.{,32}' > output.txt
David Foerster
  • 36,900
  • 56
  • 98
  • 152
0

The safe and easy way to merge lines of a text file is to use paste command:

$ cat input.txt
comment 1
comment 2 
...
comment n

$ paste -s -d ' ' <input.txt >output.txt

$ cat output.txt comment 1 comment 2 ... comment n

You can change the delimiter -d ' ' to any other character of your choice.

As for the 32 characters limit, did you really mean on output.txt? Edit your post if you meant input.txt, in which case the command head -c 32 could trim the lines before piping them to paste.

canupseq
  • 276
0

Awk shall be fine. One way is:

$ echo -n "comment 1\rcomment 2\r...\rcomment n\r" > input.txt
$ cat input.txt | awk -v FS="" -v RS="" '{for (i=1;i<=32;i++) printf ($i == "\r")? "" : $i}' > output.txt
$ cat output.txt 
comment 1comment 2...comment 

Explanation: by default awk processes input line-by-line, with single line called record; every line processed column-by-column, with single column called field. Every field is referred by variables starting with 1, e.g. $1, $2, $3…

So you change the default behavior by setting Field Separator to "", causing awk to process stuff character-by-character. Then you set Record Separator to "" so you can refer to characters of all text at once (i.e. without writing a code to handle stuff line-by-line).

Finally, you can easily operate on characters, so you loop over the fields (i.e. characters), and print only when the character is not a carriage return.

Hi-Angel
  • 4,915
  • Why all these carriage-return characters (\r) in the input? The escape sequence for newline characters is \n. – David Foerster Sep 10 '17 at 11:51
  • @DavidFoerster OP asked for carriage return, idk why. – Hi-Angel Sep 10 '17 at 11:52
  • 1
    Hmm… you're right. But I think they actually meant line break/newline characters. – David Foerster Sep 10 '17 at 11:52
  • @DavidFoerster well, \r is easy to replace with \n, so it's not a big deal. But FTR, my original answer have used \n ☺ But then I noticed OP's "Control Return" with first letters suspiciously similar to CR, and quickly replaced it. This edit is not saved because I did it within the 5 minutes timeout. – Hi-Angel Sep 10 '17 at 12:24
0
tr '\n' ' ' < in.txt | cut -c -32
  • tr '\n' ' ': remove new lines from input text
  • cut -c -32: limit the output to 32 characters
George Udosen
  • 37,674