PostgreSQL dictionary consists of 3 files:
ISPELL_FILE=`curl https://sjp.pl/slownik/ort/ | grep -o ">sjp-ispell-pl-.*-src.tar.bz2" | cut -c2-`
wget https://sjp.pl/slownik/ort/`echo $ISPELL_FILE`
tar xvjf sjp-ispell-pl-*-src.tar.bz2
cd sjp-ispell-pl-[0-9]*
iconv -f ISO_8859-2 -t utf-8 polish.aff > polish.affix
iconv -f ISO_8859-2 -t utf-8 polish.all > polish.dict
curl https://raw.githubusercontent.com/dominem/stopwords/master/polish.stopwords.txt > polish.stop
cp polish.affix `pg_config --sharedir`/tsearch_data/
cp polish.dict `pg_config --sharedir`/tsearch_data/
cp polish.stop `pg_config --sharedir`/tsearch_data/
ISPELL_FILE=`curl https://sjp.pl/slownik/ort/ | grep -o ">sjp-ispell-pl-.*-src.tar.bz2" | cut -c2-`
wget https://sjp.pl/slownik/ort/`echo $ISPELL_FILE`
tar xvjf sjp-ispell-pl-*-src.tar.bz2
cd sjp-ispell-pl-[0-9]*
iconv -f ISO_8859-2 -t utf-8 polish.aff > polish.affix
iconv -f ISO_8859-2 -t utf-8 polish.all > polish.dict
curl https://raw.githubusercontent.com/dominem/stopwords/master/polish.stopwords.txt > polish.stop
cp polish.affix `pg_config --sharedir`/tsearch_data/
cp polish.dict `pg_config --sharedir`/tsearch_data/
cp polish.stop `pg_config --sharedir`/tsearch_data/
The following SQL does the trick:
BEGIN;
CREATE TEXT SEARCH CONFIGURATION public.polish ( COPY = pg_catalog.english );
CREATE TEXT SEARCH DICTIONARY polish_ispell (
TEMPLATE = ispell,
DictFile = polish,
AffFile = polish,
StopWords = polish
);
ALTER TEXT SEARCH CONFIGURATION polish
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part
WITH polish_ispell;
COMMIT;
You’ll probably want to execute it in database migration. For down direction the following SQL should be queried:
BEGIN;
DROP TEXT SEARCH CONFIGURATION IF EXISTS public.polish;
DROP TEXT SEARCH CONFIGURATION IF EXISTS polish;
DROP TEXT SEARCH DICTIONARY IF EXISTS polish_ispell;
COMMIT;
If everything went well, the following query:
SELECT to_tsvector('polish', 'Czuję się mniej więcej tak, jak ktoś, kto bujał w obłokach i nagle spadł.');
Will output:
to_tsvector
------------------------------------------------------------------------------------
'bujać':9 'czuć':1 'jaka':6 'mniej':3 'nagły':13 'obłok':11 'obłoki':11 'spaść':14
None
Comments