MOON

File: //usr/local/ssl/local/share/man/man3/Mail::SpamAssassin::Plugin::TextCat.3
.\" Automatically generated by Pod::Man v1.37, Pod::Parser v1.32
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sh \" Subsection heading
.br
.if t .Sp
.ne 5
.PP
\fB\\$1\fR
.PP
..
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings.  \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote.  | will give a
.\" real vertical bar.  \*(C+ will give a nicer C++.  Capital omega is used to
.\" do unbreakable dashes and therefore won't be available.  \*(C` and \*(C'
.\" expand to `' in nroff, nothing in troff, for use with C<>.
.tr \(*W-|\(bv\*(Tr
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
.    ds -- \(*W-
.    ds PI pi
.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
.    ds L" ""
.    ds R" ""
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds -- \|\(em\|
.    ds PI \(*p
.    ds L" ``
.    ds R" ''
'br\}
.\"
.\" If the F register is turned on, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.if \nF \{\
.    de IX
.    tm Index:\\$1\t\\n%\t"\\$2"
..
.    nr % 0
.    rr F
.\}
.\"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.hy 0
.if n .na
.\"
.\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
.\" Fear.  Run.  Save yourself.  No user-serviceable parts.
.    \" fudge factors for nroff and troff
.if n \{\
.    ds #H 0
.    ds #V .8m
.    ds #F .3m
.    ds #[ \f1
.    ds #] \fP
.\}
.if t \{\
.    ds #H ((1u-(\\\\n(.fu%2u))*.13m)
.    ds #V .6m
.    ds #F 0
.    ds #[ \&
.    ds #] \&
.\}
.    \" simple accents for nroff and troff
.if n \{\
.    ds ' \&
.    ds ` \&
.    ds ^ \&
.    ds , \&
.    ds ~ ~
.    ds /
.\}
.if t \{\
.    ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
.    ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
.    ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
.    ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
.    ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
.    ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
.\}
.    \" troff and (daisy-wheel) nroff accents
.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
.ds 8 \h'\*(#H'\(*b\h'-\*(#H'
.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
.ds ae a\h'-(\w'a'u*4/10)'e
.ds Ae A\h'-(\w'A'u*4/10)'E
.    \" corrections for vroff
.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
.    \" for low resolution devices (crt and lpr)
.if \n(.H>23 .if \n(.V>19 \
\{\
.    ds : e
.    ds 8 ss
.    ds o a
.    ds d- d\h'-1'\(ga
.    ds D- D\h'-1'\(hy
.    ds th \o'bp'
.    ds Th \o'LP'
.    ds ae ae
.    ds Ae AE
.\}
.rm #[ #] #H #V #F C
.\" ========================================================================
.\"
.IX Title "Mail::SpamAssassin::Plugin::TextCat 3"
.TH Mail::SpamAssassin::Plugin::TextCat 3 "2010-03-16" "perl v5.8.8" "User Contributed Perl Documentation"
.SH "NAME"
Mail::SpamAssassin::Plugin::TextCat \- TextCat language guesser
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
.Vb 1
\&  loadplugin     Mail::SpamAssassin::Plugin::TextCat
.Ve
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
This plugin will try to guess the language used in the message text.
.PP
You can then specify which languages are considered okay for incoming
mail and if the guessed language is not okay, \f(CW\*(C`UNWANTED_LANGUAGE_BODY\*(C'\fR
is triggered
.PP
It will always add the results to a \*(L"X\-Language\*(R" name-value pair in
the message metadata data structure. This may be useful as Bayes
tokens. The results can also be added to marked-up messages using
\&\*(L"add_header\*(R", with the _LANGUAGES_ tag. See
Mail::SpamAssassin::Conf for details.
.PP
Note: the language cannot always be recognized with sufficient
confidence.  In that case, \f(CW\*(C`UNWANTED_LANGUAGE_BODY\*(C'\fR will not trigger.
.SH "USER OPTIONS"
.IX Header "USER OPTIONS"
.IP "ok_languages xx [ yy zz ... ]		(default: all)" 4
.IX Item "ok_languages xx [ yy zz ... ]		(default: all)"
This option is used to specify which languages are considered okay for
incoming mail.  SpamAssassin will try to detect the language used in the
message text.
.Sp
Note that the language cannot always be recognized with sufficient
confidence.  In that case, no points will be assigned.
.Sp
The rule \f(CW\*(C`UNWANTED_LANGUAGE_BODY\*(C'\fR is triggered based on how this is set.
.Sp
In your configuration, you must use the two or three letter language
specifier in lowercase, not the English name for the language.  You may
also specify \f(CW\*(C`all\*(C'\fR if a desired language is not listed, or if you want to
allow any language.  The default setting is \f(CW\*(C`all\*(C'\fR.
.Sp
Examples:
.Sp
.Vb 3
\&  ok_languages all         (allow all languages)
\&  ok_languages en          (only allow English)
\&  ok_languages en ja zh    (allow English, Japanese, and Chinese)
.Ve
.Sp
Note: if there are multiple ok_languages lines, only the last one is used.
.Sp
Select the languages to allow from the list below:
.RS 4
.IP "af	\- Afrikaans" 4
.IX Item "af	- Afrikaans"
.PD 0
.IP "am	\- Amharic" 4
.IX Item "am	- Amharic"
.IP "ar	\- Arabic" 4
.IX Item "ar	- Arabic"
.IP "be	\- Byelorussian" 4
.IX Item "be	- Byelorussian"
.IP "bg	\- Bulgarian" 4
.IX Item "bg	- Bulgarian"
.IP "bs	\- Bosnian" 4
.IX Item "bs	- Bosnian"
.IP "ca	\- Catalan" 4
.IX Item "ca	- Catalan"
.IP "cs	\- Czech" 4
.IX Item "cs	- Czech"
.IP "cy	\- Welsh" 4
.IX Item "cy	- Welsh"
.IP "da	\- Danish" 4
.IX Item "da	- Danish"
.IP "de	\- German" 4
.IX Item "de	- German"
.IP "el	\- Greek" 4
.IX Item "el	- Greek"
.IP "en	\- English" 4
.IX Item "en	- English"
.IP "eo	\- Esperanto" 4
.IX Item "eo	- Esperanto"
.IP "es	\- Spanish" 4
.IX Item "es	- Spanish"
.IP "et	\- Estonian" 4
.IX Item "et	- Estonian"
.IP "eu	\- Basque" 4
.IX Item "eu	- Basque"
.IP "fa	\- Persian" 4
.IX Item "fa	- Persian"
.IP "fi	\- Finnish" 4
.IX Item "fi	- Finnish"
.IP "fr	\- French" 4
.IX Item "fr	- French"
.IP "fy	\- Frisian" 4
.IX Item "fy	- Frisian"
.IP "ga	\- Irish Gaelic" 4
.IX Item "ga	- Irish Gaelic"
.IP "gd	\- Scottish Gaelic" 4
.IX Item "gd	- Scottish Gaelic"
.IP "he	\- Hebrew" 4
.IX Item "he	- Hebrew"
.IP "hi	\- Hindi" 4
.IX Item "hi	- Hindi"
.IP "hr	\- Croatian" 4
.IX Item "hr	- Croatian"
.IP "hu	\- Hungarian" 4
.IX Item "hu	- Hungarian"
.IP "hy	\- Armenian" 4
.IX Item "hy	- Armenian"
.IP "id	\- Indonesian" 4
.IX Item "id	- Indonesian"
.IP "is	\- Icelandic" 4
.IX Item "is	- Icelandic"
.IP "it	\- Italian" 4
.IX Item "it	- Italian"
.IP "ja	\- Japanese" 4
.IX Item "ja	- Japanese"
.IP "ka	\- Georgian" 4
.IX Item "ka	- Georgian"
.IP "ko	\- Korean" 4
.IX Item "ko	- Korean"
.IP "la	\- Latin" 4
.IX Item "la	- Latin"
.IP "lt	\- Lithuanian" 4
.IX Item "lt	- Lithuanian"
.IP "lv	\- Latvian" 4
.IX Item "lv	- Latvian"
.IP "mr	\- Marathi" 4
.IX Item "mr	- Marathi"
.IP "ms	\- Malay" 4
.IX Item "ms	- Malay"
.IP "ne	\- Nepali" 4
.IX Item "ne	- Nepali"
.IP "nl	\- Dutch" 4
.IX Item "nl	- Dutch"
.IP "no	\- Norwegian" 4
.IX Item "no	- Norwegian"
.IP "pl	\- Polish" 4
.IX Item "pl	- Polish"
.IP "pt	\- Portuguese" 4
.IX Item "pt	- Portuguese"
.IP "qu	\- Quechua" 4
.IX Item "qu	- Quechua"
.IP "rm	\- Rhaeto-Romance" 4
.IX Item "rm	- Rhaeto-Romance"
.IP "ro	\- Romanian" 4
.IX Item "ro	- Romanian"
.IP "ru	\- Russian" 4
.IX Item "ru	- Russian"
.IP "sa	\- Sanskrit" 4
.IX Item "sa	- Sanskrit"
.IP "sco	\- Scots" 4
.IX Item "sco	- Scots"
.IP "sk	\- Slovak" 4
.IX Item "sk	- Slovak"
.IP "sl	\- Slovenian" 4
.IX Item "sl	- Slovenian"
.IP "sq	\- Albanian" 4
.IX Item "sq	- Albanian"
.IP "sr	\- Serbian" 4
.IX Item "sr	- Serbian"
.IP "sv	\- Swedish" 4
.IX Item "sv	- Swedish"
.IP "sw	\- Swahili" 4
.IX Item "sw	- Swahili"
.IP "ta	\- Tamil" 4
.IX Item "ta	- Tamil"
.IP "th	\- Thai" 4
.IX Item "th	- Thai"
.IP "tl	\- Tagalog" 4
.IX Item "tl	- Tagalog"
.IP "tr	\- Turkish" 4
.IX Item "tr	- Turkish"
.IP "uk	\- Ukrainian" 4
.IX Item "uk	- Ukrainian"
.IP "vi	\- Vietnamese" 4
.IX Item "vi	- Vietnamese"
.IP "yi	\- Yiddish" 4
.IX Item "yi	- Yiddish"
.IP "zh	\- Chinese (both Traditional and Simplified)" 4
.IX Item "zh	- Chinese (both Traditional and Simplified)"
.IP "zh.big5	\- Chinese (Traditional only)" 4
.IX Item "zh.big5	- Chinese (Traditional only)"
.IP "zh.gb2312	\- Chinese (Simplified only)" 4
.IX Item "zh.gb2312	- Chinese (Simplified only)"
.RE
.RS 4
.PD
.Sp
\&\&
.RE
.IP "inactive_languages xx [ yy zz ... ]		(default: see below)" 4
.IX Item "inactive_languages xx [ yy zz ... ]		(default: see below)"
This option is used to specify which languages will not be considered
when trying to guess the language.  For performance reasons, supported
languages that have fewer than about 5 million speakers are disabled by
default.  Note that listing a language in \f(CW\*(C`ok_languages\*(C'\fR automatically
enables it for that user.
.Sp
The default setting is:
.RS 4
.IP "bs cy eo et eu fy ga gd is la lt lv rm sa sco sl yi" 4
.IX Item "bs cy eo et eu fy ga gd is la lt lv rm sa sco sl yi"
.RE
.RS 4
.Sp
That list is Bosnian, Welsh, Esperanto, Estonian, Basque, Frisian, Irish
Gaelic, Scottish Gaelic, Icelandic, Latin, Lithuanian, Latvian,
Rhaeto\-Romance, Sanskrit, Scots, Slovenian, and Yiddish.
.RE
.IP "textcat_max_languages N (default: 5)" 4
.IX Item "textcat_max_languages N (default: 5)"
The maximum number of languages before the classification is considered unknown.
.IP "textcat_optimal_ngrams N (default: 0)" 4
.IX Item "textcat_optimal_ngrams N (default: 0)"
If the number of ngrams is lower than this number then they will be removed.  This
can be used to speed up the program for longer inputs.  For shorter inputs, this
should be set to 0.
.IP "textcat_max_ngrams N (default: 400)" 4
.IX Item "textcat_max_ngrams N (default: 400)"
The maximum number of ngrams that should be compared with each of the languages
models (note that each of those models is used completely).
.IP "textcat_acceptable_score N (default: 1.05)" 4
.IX Item "textcat_acceptable_score N (default: 1.05)"
Include any language that scores at least \f(CW\*(C`textcat_acceptable_score\*(C'\fR in the
returned list of languages