38

I am happy that grep does support Perl Compatible Regular Expressions with the -P option.

Is there a reason why the tool sed does not have this feature?

galoget
  • 3,023
  • 2
  • 22
  • 24
guettli
  • 1,765
  • 7
  • 2
  • I found that: https://github.com/chmln/sd – guettli Dec 11 '24 at 14:55
  • PCREs are syntactic sugar that'd add unnecessary complexity and worse performance to a tool that works just fine without it and for anything it's difficult to do with sed and BREs or EREs you could just do it in awk with EREs but also compound conditions, variables, and other functionality that make PCREs unnecessary. – Ed Morton May 03 '25 at 17:08
  • @EdMorton that is your opinion. I like PCRE. I like \s \S \b \w \W and *?. grep supports these with -P, too. So why should sed not support it? – guettli May 06 '25 at 12:44
  • @guettie right, that's my opinion. FWIW it's based on 40+ years of Unix development experience but others may have different experiences of course. As for why not support PCREs - because there's no standard for it and it adds complexity to the regexp engine and degrades performance compared to EREs. Regarding the constructs you like, GNU sed and GNU awk support \s \S \b (or equivalent \y) \w and \W in EREs. I understand *? can lead to briefer code in some cases but it's not necessary which is why most seds/awks aren't implementing it despite it now being defined by POSIX as part of EREs. – Ed Morton May 06 '25 at 14:25
  • Note that it's only GNU grep that supports -P so if you use it you're making your code unnecessarily non-portable to other systems and the man page describes it as somewhat experimental with some unimplemented features ("This option is experimental when combined with the -z (--null-data) option, and grep -P may warn of unimplemented features."), and the manual warns it can produce different output to perl in some cases. So, use at your own risk. – Ed Morton May 06 '25 at 14:34
  • @EdMorton portable shell scripts ... Like writing Ruby which runs in Python at the same time. I am looking for different challenges in my life. – guettli May 08 '25 at 08:02

4 Answers4

36

Work-around:

You can use the Pathological Eclectic Rubbish Lister:

perl -pe 's/../../g' file

or inline replace:

perl -i -pe 's/../../g' file

This works for the cases where I use sed. If things get more complicated I write a small python script.

BTW, I switched to No Shell-Scripting

mivk
  • 5,821
guettli
  • 1,765
  • Well that's great for substitution, but how would you do other sed stuff in Perl? like for example /delete this line/ d – wjandrea Nov 26 '18 at 15:34
  • 1
    The most promising thing I found after a quick search is s2p (sed to Perl), though I just tried it and the output was VERY verbose. – wjandrea Nov 26 '18 at 16:00
  • 1
    @wjandrea I updated the answer: "This works for the cases where I use sed. If things get more complicated I write a small python script." – guettli Nov 27 '18 at 08:06
  • 2
    @wjandrea: perl -ne 'print unless /delete this line/' – Adrian Pronk Mar 18 '20 at 01:05
  • 1
    FWIW, perl -pie 's/...' somefile fails with Can't open perl script "s/...": No such file or directory, while perl -pi -e 's/...' somefile works fine (perl v5.28.1 on Debian 10). This is probably because -i takes an (optional) backup suffix, so it eats the e option. – Jakob Sep 22 '21 at 06:53
12

In the case of GNU Sed, the stated reason appears to be

I was afraid it fell into one of those 'cracks'...though from what was said at the time, some part of the work was already done and it looked like a matter of docs and packaging... (though, I admit, in Computer Sci, the last 10% of the work often takes 90% of the time...

See GNU bug report logs - #22801 status on committed change: upgrading 'sed' RE's to include perlRE syntax - or search the sed-devel Archives for "PCRE" if you want more details.

Don't forget you can use perl itself for many of the simple one-liners for which you might want to use PCRE in sed.

steeldriver
  • 143,099
2

Personally I found it easier to do in Python than Perl or Sed.

cat file \
| python3 -c 'import sys, re; s = sys.stdin.read(); s=re.sub(r"regex", "replace string", s); print(s);' \
| sudo tee file

full example

# add quay and docker registries to approved cri-o registries
cat /etc/crio/crio.conf \
| python3 -c 'import sys, re; s = sys.stdin.read(); s=re.sub(r"#registries\s+\=\s+\[\n#\s+\]", "registries = [\"docker.io\",\"quay.io\"]", s); print(s);' \
| sudo tee /etc/crio/crio.conf

jmcgrath207
  • 230
  • 2
  • 6
  • 3
    It's a matter of taste and habits I guess, but taking your 3 line example, I prefer the much simpler Perl version: sudo perl -077 -i.bak -pe 's/#(registries\s*\=)\s*\[\s*#\s*\]/$1 ["docker.io","quay.io"]/;' crio.conf. (the -077 makes it read the whole file. -i.bak does it in-place with a backup file using a ".bak" extension. And $1 is the part in parenthesis). And it's about half the characters to type :-) – mivk Oct 19 '21 at 15:07
2

As my substitution needs have become more complex, using perl -pe becomes preferable to sed -e. In particular, being able to use perl character classes and the quantifiers is more concise than the hoops I need to jump through for sed.

journalctl -u auditd -S 'yesterday' |\
  perl -pe 's/^(\w{3} \d{2} \d{2}:\d{2}:\d{2}) ([\w-]+) audispd/$1 generic-hostname audispd/;
      s/node=[\w-]+/node=generic-hostname/;'

vs

journalctl -u auditd -S "yesterday" |\
  sed -e 's/^\([[:alpha:]]\{3\} [[:digit:]]\{2\} [[:digit:]]\{2\}:[[:digit:]]\{2\}:[[:digit:]]\{2\}\) \([[:alpha:]-]\+\) audispd/\1 generic-hostname audispd/;
      s/node=\([[:alpha:]-]\+\) /node=generic-hostname /;'

I could use [0-9] instead of [[:digit:]] and [A-Za-z] instead of [[:alpha:]], but a) both of those are longer than the perl equivalents and b) [A-Za-z] will match non-ASCII characters like the perl equivalents can.

bosses-r-dum> echo 'å' | sed -e 's/[A-Za-z]/X/'
å
bosses-r-dum> echo 'å' | perl -CS -pe 's/\w/X/'
X
bosses-r-dum> 

If you have to deal with unicode, being able to add a flag and have things "Just Work" is very handy. I tend to grow my regexp's organically, so using the same tool for 'simple' and 'complex' regexp's makes sense because my 'simple' regexp can easily turn into a 'complex' one if/when requirements change and I don't need to do any tooling changes (change all [x]\{#\} instances into [x]{#} and the like).