Sometimes I need to search files with accented characters (diacritic in general), usually with locate (mlocate flavor, Merging Locate; see below the warning related to plocate). I wish to setup (maybe in /etc/updatedb.conf) so it let me search for this special characters using a certain language mapping, for example:
a == âàáäÂÀÂÄ
e == êèéëÊÈÉË
i == îïíÎÏ
o == ôöóÔÖ
u == ûùüÛÜÙ
c == çÇ
n == ñ
So locate -i liberación also should search for file names with string liberacion and even liberaciòn.
Notes and assumptions
- And maybe others: ÂÃÄÀÁÅÆ ÇÈÉÊËÌÍÎÏ ÐÑÒÓÔÕÖØÙÚÛÜÝÞ ßàáâãäåæç èéêëìíîïðñòóôõö øùúûüýþÿ.
- This is a common situation on romance languages like Spanish, French, and German.
- I'm always using a locale 100% UTF-8.
- I would rather not have to use regular expressions.
- A patch might use ASCII transliterations of Unicode as Unidecode/cUnidecode does. Most of mlocate is written on C.
Related
- Similar question but using
find - Miloslav Trmač (
mlocatedeveloper) say here that the official source code is on pagure.io (and a fork on Github). - I file an issue on mlocate repo at Pagure.io to add this feature.
- Update 2018-02: This can be fixed with this pull request by marcotrevisan. Will add a
-t/--transliteratesupport usingiconvto match accented. - Update 2018-03:
mlocatewith support for--transliterateis now included in Ubuntu 18.04 LTS Bionic Beaver (v2 and v3.1).
- Update 2018-02: This can be fixed with this pull request by marcotrevisan. Will add a
grep -forfgrepto avoid the interpretation of"$CH"as a special character, e. g.grep ^would match any line butgrep -f ^only matches those that contain the character^. It may also be easier to use character classes to craft the regular expression, i. e.REG="[$CHARS]"is probably easier than yoursedcommand. Watch out for special characters though! Otherwise a good approach. +1 – David Foerster May 22 '17 at 09:13plocate, but for some reason I'm getting too many apparently false positives. Also, do you know if it's possible to avoid SC2001 on line 18? – Pablo Bianchi Nov 22 '22 at 23:25