|
Sphinx
Community
Services
Misc
Subscribe in a reader
|
utf8 doesn't work for Chinese?
Common forum |
1 | 2 | 3 | 4 | 5 | ... |
273 | 274 | 275 | 276 | next »» | Create new thread
|
dayqz
Name: dayqz Posts: 4 |
2006-12-25 08:04:39
| reply!
This is quite strange.
I compiled eveything from source. Then i set mysql encoding to utf8:
mysql> show variables like '%char%';
+--------------------------+----------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/share/mysql/charsets/ |
+--------------------------+----------------------------------+
I coded these lines in sphinx.conf
sql_query_pre = SET NAMES utf8
charset_type = utf-8
Everything works fine for English.
But when I filled the table with Chinese sentences like this
(804, 1, NOW(), 'title', 'Ϊ ½¨ Éè Ò» ¸ö ºÍ ƽ ·¢ Õ¹ '),
(805, 1, NOW(), 'title', 'ÎÄ Ã÷ ½ø ²½ µÄ ÊÀ ½ç ¶ø ¼Ì Ðø Ŭ Á¦ ·Ü ¶· ')
£¬and then search with command:
/usr/local/bin/search µÄ
sphinx returns 0 result.
I think since I insert space between each Chinese Glyph, there should be no problem for
sphinx to index each record.
Can anybody point out which point I missed?
Thank you.
|
 |
|
dayqz
Name: dayqz Posts: 4 |
to: dayqz, 2006-12-25 08:10:15
| reply!
plus:
mysql Server version: 5.1.14-beta Source distribution
Sphinx 0.9.7-rc2
> This is quite strange.
> I compiled eveything from source. Then i set mysql encoding to utf8:
> mysql> show variables like '%char%';
> +--------------------------+----------------------------------+
> | Variable_name | Value |
> +--------------------------+----------------------------------+
> | character_set_client | utf8 |
> | character_set_connection | utf8 |
> | character_set_database | utf8 |
> | character_set_filesystem | binary |
> | character_set_results | utf8 |
> | character_set_server | utf8 |
> | character_set_system | utf8 |
> | character_sets_dir | /usr/local/share/mysql/charsets/ |
> +--------------------------+----------------------------------+
> I coded these lines in sphinx.conf
> sql_query_pre = SET NAMES utf8
> charset_type = utf-8
> Everything works fine for English.
> But when I filled the table with Chinese sentences like this
> (804, 1, NOW(), 'title', 'Ϊ ½¨ Éè Ò» ¸ö ºÍ ƽ ·¢ Õ¹ '),
> (805, 1, NOW(), 'title', 'ÎÄ Ã÷ ½ø ²½ µÄ ÊÀ ½ç ¶ø ¼Ì Ðø Ŭ Á¦ ·Ü ¶· ')
> £¬and then search with command:
> /usr/local/bin/search µÄ
> sphinx returns 0 result.
>
> I think since I insert space between each Chinese Glyph, there should be no problem for
> sphinx to index each record.
>
> Can anybody point out which point I missed?
> Thank you.
|
 |
|
Nordic
Posts: 299 |
to: dayqz, 2006-12-26 11:19:38
| reply!
> plus:
> mysql Server version: 5.1.14-beta Source distribution
> Sphinx 0.9.7-rc2
It's because of the issue of CJK languages and their glyphs.
There are very few spaces in CJK; it's not an English-like language with regular spaces
separating individual words. This means a string of CJK characters, no matter how long
the sequence – is seen as a word in the eyes of Sphinx.
The only way I see Andrew fixing this is if during indexing he separates characters in
the CJK Unicode spec ranges and indexes each individual glyph as a separate word.
With some C++ knowledge you could take a look at the Sphinx source yourself and give it a
shot. This is only a moderate solution however, systems like Google etc have a better
knowledge of the dictionaries of such languages and thus know much more about the
language rules, grammar etc.
You would also need to alter the charset_table in sphinx.conf.
Here’s one I’ve created, it includes Arabic, accented Latin and CJK ranges I think. I
can’t remember exactly which Unicode ranges I enabled, lol:
index common {
min_word_len = 1
charset_type = utf-8
charset_table = U+FF10..U+FF19->0..9, 0..9, U+FF41..U+FF5A->a..z, U+FF21..U+FF3A->a..z,
A..Z->a..z, a..z, U+0149, U+017F, U+0138, U+00DF, U+00FF, U+00C0..U+00D6->U+00E0..U+00F6,
U+00E0..U+00F6, U+00D8..U+00DE->U+00F8..U+00FE, U+00F8..U+00FE, U+0100->U+0101, U+0101,
U+0102->U+0103, U+0103, U+0104->U+0105, U+0105, U+0106->U+0107, U+0107, U+0108->U+0109,
U+0109, U+010A->U+010B, U+010B, U+010C->U+010D, U+010D, U+010E->U+010F, U+010F,
U+0110->U+0111, U+0111, U+0112->U+0113, U+0113, U+0114->U+0115, U+0115, U+0116->U+0117,
U+0117, U+0118->U+0119, U+0119, U+011A->U+011B, U+011B, U+011C->U+011D, U+011D,
U+011E->U+011F, U+011F, U+0130->U+0131, U+0131, U+0132->U+0133, U+0133, U+0134->U+0135,
U+0135, U+0136->U+0137, U+0137, U+0139->U+013A, U+013A, U+013B->U+013C, U+013C,
U+013D->U+013E, U+013E, U+013F->U+0140, U+0140, U+0141->U+0142, U+0142, U+0143->U+0144,
U+0144, U+0145->U+0146, U+0146, U+0147->U+0148, U+0148, U+014A->U+014B, U+014B,
U+014C->U+014D, U+014D, U+014E->U+014F, U+014F, U+0150->U+0151, U+0151, U+0152->U+0153,
U+0153, U+0154->U+0155, U+0155, U+0156->U+0157, U+0157, U+0158->U+0159, U+0159,
U+015A->U+015B, U+015B, U+015C->U+015D, U+015D, U+015E->U+015F, U+015F, U+0160->U+0161,
U+0161, U+0162->U+0163, U+0163, U+0164->U+0165, U+0165, U+0166->U+0167, U+0167,
U+0168->U+0169, U+0169, U+016A->U+016B, U+016B, U+016C->U+016D, U+016D, U+016E->U+016F,
U+016F, U+0170->U+0171, U+0171, U+0172->U+0173, U+0173, U+0174->U+0175, U+0175,
U+0176->U+0177, U+0177, U+0178->U+00FF, U+00FF, U+0179->U+017A, U+017A, U+017B->U+017C,
U+017C, U+017D->U+017E, U+017E, U+4E00..U+9FFF
docinfo = extern
morphology = none
}
|
 |
|
shodan
Name: Andrew Aksyonoff Posts: 4275 |
to: dayqz, 2006-12-27 03:24:42
| reply!
> I think since I insert space between each Chinese Glyph, there should be no problem for
> sphinx to index each record.
Yes, but you also need to add Chinese UTF-8 codes range to charset_table - so that Sphinx
would treat them as significant characters.
|
 |
|
dayqz
Name: dayqz Posts: 4 |
to: Nordic, 2006-12-27 09:32:19
| reply!
Thank you very much!
It worked!
> > plus:
> > mysql Server version: 5.1.14-beta Source distribution
> > Sphinx 0.9.7-rc2
>
> It's because of the issue of CJK languages and their glyphs.
>
> There are very few spaces in CJK; it's not an English-like language with regular spaces
> separating individual words. This means a string of CJK characters, no matter how long
> the sequence ?is seen as a word in the eyes of Sphinx.
>
> The only way I see Andrew fixing this is if during indexing he separates characters in
> the CJK Unicode spec ranges and indexes each individual glyph as a separate word.
>
> With some C++ knowledge you could take a look at the Sphinx source yourself and give it a
> shot. This is only a moderate solution however, systems like Google etc have a better
> knowledge of the dictionaries of such languages and thus know much more about the
> language rules, grammar etc.
>
> You would also need to alter the charset_table in sphinx.conf.
>
> Here’s one I’ve created, it includes Arabic, accented Latin and CJK ranges I think. I
> can’t remember exactly which Unicode ranges I enabled, lol:
>
> index common {
> min_word_len = 1
> charset_type = utf-8
> charset_table = U+FF10..U+FF19->0..9, 0..9, U+FF41..U+FF5A->a..z, U+FF21..U+FF3A->a..z,
> A..Z->a..z, a..z, U+0149, U+017F, U+0138, U+00DF, U+00FF, U+00C0..U+00D6->U+00E0..U+00F6,
> U+00E0..U+00F6, U+00D8..U+00DE->U+00F8..U+00FE, U+00F8..U+00FE, U+0100->U+0101, U+0101,
> U+0102->U+0103, U+0103, U+0104->U+0105, U+0105, U+0106->U+0107, U+0107, U+0108->U+0109,
> U+0109, U+010A->U+010B, U+010B, U+010C->U+010D, U+010D, U+010E->U+010F, U+010F,
> U+0110->U+0111, U+0111, U+0112->U+0113, U+0113, U+0114->U+0115, U+0115, U+0116->U+0117,
> U+0117, U+0118->U+0119, U+0119, U+011A->U+011B, U+011B, U+011C->U+011D, U+011D,
> U+011E->U+011F, U+011F, U+0130->U+0131, U+0131, U+0132->U+0133, U+0133, U+0134->U+0135,
> U+0135, U+0136->U+0137, U+0137, U+0139->U+013A, U+013A, U+013B->U+013C, U+013C,
> U+013D->U+013E, U+013E, U+013F->U+0140, U+0140, U+0141->U+0142, U+0142, U+0143->U+0144,
> U+0144, U+0145->U+0146, U+0146, U+0147->U+0148, U+0148, U+014A->U+014B, U+014B,
> U+014C->U+014D, U+014D, U+014E->U+014F, U+014F, U+0150->U+0151, U+0151, U+0152->U+0153,
> U+0153, U+0154->U+0155, U+0155, U+0156->U+0157, U+0157, U+0158->U+0159, U+0159,
> U+015A->U+015B, U+015B, U+015C->U+015D, U+015D, U+015E->U+015F, U+015F, U+0160->U+0161,
> U+0161, U+0162->U+0163, U+0163, U+0164->U+0165, U+0165, U+0166->U+0167, U+0167,
> U+0168->U+0169, U+0169, U+016A->U+016B, U+016B, U+016C->U+016D, U+016D, U+016E->U+016F,
> U+016F, U+0170->U+0171, U+0171, U+0172->U+0173, U+0173, U+0174->U+0175, U+0175,
> U+0176->U+0177, U+0177, U+0178->U+00FF, U+00FF, U+0179->U+017A, U+017A, U+017B->U+017C,
> U+017C, U+017D->U+017E, U+017E, U+4E00..U+9FFF
> docinfo = extern
> morphology = none
> }
>
|
 |
|
dayqz
Name: dayqz Posts: 4 |
to: shodan, 2006-12-27 09:39:09
| reply!
Thank you for your ultra fast search engine :)
> > I think since I insert space between each Chinese Glyph, there should be no problem for
> sphinx to index each record.
>
> Yes, but you also need to add Chinese UTF-8 codes range to charset_table - so that Sphinx
> would treat them as significant characters.
|
 |
|
murion
Name: Paul Posts: 2 |
to: Nordic, 2007-01-30 00:10:47
| reply!
> Here’s one I’ve created, it includes Arabic, accented Latin and CJK ranges I think. I
> can’t remember exactly which Unicode ranges I enabled, lol:
>
> index common {
> min_word_len = 1
> charset_type = utf-8
> charset_table = U+FF10..U+FF19->0..9, 0..9, U+FF41..U+FF5A->a..z, U+FF21..U+FF3A->a..z,
> A..Z->a..z, a..z, U+0149, U+017F, U+0138, U+00DF, U+00FF, U+00C0..U+00D6->U+00E0..U+00F6,
> U+00E0..U+00F6, U+00D8..U+00DE->U+00F8..U+00FE, U+00F8..U+00FE, U+0100->U+0101, U+0101,
> U+0102->U+0103, U+0103, U+0104->U+0105, U+0105, U+0106->U+0107, U+0107, U+0108->U+0109,
> U+0109, U+010A->U+010B, U+010B, U+010C->U+010D, U+010D, U+010E->U+010F, U+010F,
> U+0110->U+0111, U+0111, U+0112->U+0113, U+0113, U+0114->U+0115, U+0115, U+0116->U+0117,
> U+0117, U+0118->U+0119, U+0119, U+011A->U+011B, U+011B, U+011C->U+011D, U+011D,
> U+011E->U+011F, U+011F, U+0130->U+0131, U+0131, U+0132->U+0133, U+0133, U+0134->U+0135,
> U+0135, U+0136->U+0137, U+0137, U+0139->U+013A, U+013A, U+013B->U+013C, U+013C,
> U+013D->U+013E, U+013E, U+013F->U+0140, U+0140, U+0141->U+0142, U+0142, U+0143->U+0144,
> U+0144, U+0145->U+0146, U+0146, U+0147->U+0148, U+0148, U+014A->U+014B, U+014B,
> U+014C->U+014D, U+014D, U+014E->U+014F, U+014F, U+0150->U+0151, U+0151, U+0152->U+0153,
> U+0153, U+0154->U+0155, U+0155, U+0156->U+0157, U+0157, U+0158->U+0159, U+0159,
> U+015A->U+015B, U+015B, U+015C->U+015D, U+015D, U+015E->U+015F, U+015F, U+0160->U+0161,
> U+0161, U+0162->U+0163, U+0163, U+0164->U+0165, U+0165, U+0166->U+0167, U+0167,
> U+0168->U+0169, U+0169, U+016A->U+016B, U+016B, U+016C->U+016D, U+016D, U+016E->U+016F,
> U+016F, U+0170->U+0171, U+0171, U+0172->U+0173, U+0173, U+0174->U+0175, U+0175,
> U+0176->U+0177, U+0177, U+0178->U+00FF, U+00FF, U+0179->U+017A, U+017A, U+017B->U+017C,
> U+017C, U+017D->U+017E, U+017E, U+4E00..U+9FFF
> docinfo = extern
> morphology = none
> }
This works great for me too, with the PHP API as well, for Chinese characters.
I added U+3000..U+30FF to deal with Japanese characters, and the 'search' utility works,
but not the PHP API. My query.log shows the right query string, but the dump on the
result array shows a broken UTF-8 character as the ['words'] array key, and no matches.
Any ideas?
|
 |
|
murion
Name: Paul Posts: 2 |
to: murion, 2007-01-30 14:59:14
| reply!
> This works great for me too, with the PHP API as well, for Chinese characters.
>
> I added U+3000..U+30FF to deal with Japanese characters, and the 'search' utility works,
> but not the PHP API. My query.log shows the right query string, but the dump on the
> result array shows a broken UTF-8 character as the ['words'] array key, and no matches.
>
> Any ideas?
It works great actually. I used the --rotate switch while re-indexing which was the
problem. Restarted searched and it all works.
|
|
shodan
Name: Andrew Aksyonoff Posts: 4275 |
to: murion, 2007-01-30 17:33:52
| reply!
> Restarted searched and it all works.
I suppose that the actual issue was that you changed config file while searchd was
running. Using --rotate makes it pick up new date only; *not* the config changes yet.
|
 |
|
blvm
Name: blvm Posts: 3 |
to: dayqz, 2008-06-14 09:47:48
| reply!
> This is quite strange.
> I compiled eveything from source. Then i set mysql encoding to utf8:
> mysql> show variables like '%char%';
> +--------------------------+----------------------------------+
> | Variable_name | Value |
> +--------------------------+----------------------------------+
> | character_set_client | utf8 |
> | character_set_connection | utf8 |
> | character_set_database | utf8 |
> | character_set_filesystem | binary |
> | character_set_results | utf8 |
> | character_set_server | utf8 |
> | character_set_system | utf8 |
> | character_sets_dir | /usr/local/share/mysql/charsets/ |
> +--------------------------+----------------------------------+
> I coded these lines in sphinx.conf
> sql_query_pre = SET NAMES utf8
> charset_type = utf-8
> Everything works fine for English.
> But when I filled the table with Chinese sentences like this
> (804, 1, NOW(), 'title', 'Ϊ ½¨ Éè Ò» ¸ö ºÍ ƽ ·¢ Õ¹ '),
> (805, 1, NOW(), 'title', 'ÎÄ Ã÷ ½ø ²½ µÄ ÊÀ ½ç ¶ø ¼Ì Ðø Ŭ Á¦ ·Ü ¶· ')
> £¬and then search with command:
> /usr/local/bin/search µÄ
> sphinx returns 0 result.
>
> I think since I insert space between each Chinese Glyph, there should be no problem for
> sphinx to index each record.
>
> Can anybody point out which point I missed?
> Thank you.
hello dayqz:
I have some problem of sphinx in work for Chinense,and I copy your scripts(utf-8
character-table) into sphinx.conf. rebuild index,and no error .But when I search for
words ,it return no matched.problem as your's;
plus,when I use sample to test,it's worked.
I am sorry ,my English ability is bad! so ,I copy my conf in posts;^_^
Thank you!
______________________________________
sphinx.conf
#########
#........
#########
source companysrc
{
type = mysql
sql_host = localhost
sql_user = test
sql_pass = test
sql_db = test
sql_port = 3306 # optional, default is 3306
sql_query_pre = SET NAMES utf8
sql_query = select co_id,co_name,co_person,co_do,co_address,co_phone,co_code from company
sql_attr_uint = co_id
sql_attr_str2ordinal = co_name
sql_attr_str2ordinal = co_address
sql_attr_str2ordinal = co_do
sql_attr_str2ordinal = co_person
sql_attr_str2ordinal = co_code
sql_attr_str2ordinal = co_phone
sql_ranged_throttle = 0
}
index company
{
source = companysrc
path = /usr/local/sphinx/var/data/company
docinfo = extern
mlock = 0
morphology = none
min_word_len = 1
charset_type = utf-8
html_strip = 0
charset_table = U+FF10..U+FF19->0..9, 0..9, U+FF41..U+FF5A->a..z, U+FF21..U+FF3A->a..z, \
A..Z->a..z, a..z, U+0149, U+017F, U+0138, U+00DF, U+00FF, U+00C0..U+00D6->U+00E0..U+00F6,
\
U+00E0..U+00F6, U+00D8..U+00DE->U+00F8..U+00FE, U+00F8..U+00FE, U+0100->U+0101, U+0101, \
U+0102->U+0103, U+0103, U+0104->U+0105, U+0105, U+0106->U+0107, U+0107, U+0108->U+0109, \
U+0109, U+010A->U+010B, U+010B, U+010C->U+010D, U+010D, U+010E->U+010F, U+010F, \
U+0110->U+0111, U+0111, U+0112->U+0113, U+0113, U+0114->U+0115, U+0115, U+0116->U+0117,\
U+0117, U+0118->U+0119, U+0119, U+011A->U+011B, U+011B, U+011C->U+011D, U+011D,\
U+011E->U+011F, U+011F, U+0130->U+0131, U+0131, U+0132->U+0133, U+0133, U+0134->U+0135,\
U+0135, U+0136->U+0137, U+0137, U+0139->U+013A, U+013A, U+013B->U+013C, U+013C,\
U+013D->U+013E, U+013E, U+013F->U+0140, U+0140, U+0141->U+0142, U+0142, U+0143->U+0144,\
U+0144, U+0145->U+0146, U+0146, U+0147->U+0148, U+0148, U+014A->U+014B, U+014B,\
U+014C->U+014D, U+014D, U+014E->U+014F, U+014F, U+0150->U+0151, U+0151, U+0152->U+0153,\
U+0153, U+0154->U+0155, U+0155, U+0156->U+0157, U+0157, U+0158->U+0159, U+0159,\
U+015A->U+015B, U+015B, U+015C->U+015D, U+015D, U+015E->U+015F, U+015F, U+0160->U+0161,\
U+0161, U+0162->U+0163, U+0163, U+0164->U+0165, U+0165, U+0166->U+0167, U+0167,\
U+0168->U+0169, U+0169, U+016A->U+016B, U+016B, U+016C->U+016D, U+016D, U+016E->U+016F,\
U+016F, U+0170->U+0171, U+0171, U+0172->U+0173, U+0173, U+0174->U+0175, U+0175,\
U+0176->U+0177, U+0177, U+0178->U+00FF, U+00FF, U+0179->U+017A, U+017A, U+017B->U+017C,\
U+017C, U+017D->U+017E, U+017E, U+4E00..U+9FFF
}
############
#...........
###########
________________________________________
mysql character
mysql> show variables like '%char%';
+--------------------------+----------------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/mysql/charsets/ |
+--------------------------+----------------------------------------+
________________________________
mysql table "company" struct
mysql> desc company;
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| co_id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| co_name | varchar(50) | NO | | | |
| co_person | varchar(50) | NO | | | |
| co_do | varchar(500) | NO | | | |
| co_address | varchar(50) | NO | | | |
| co_phone | varchar(30) | NO | | | |
| co_code | varchar(10) | NO | | | |
+------------+------------------+------+-----+---------+----------------+
______________________________
server:
freebsd 6.2 +mysql-5.0.45+sphinx-0.9.8-rc2
|
 |
|
Nordic
Posts: 299 |
to: blvm, 2008-06-14 11:53:40
| reply!
OK, outside of tests, how are queries submitted to the Sphinx API?
Via a web page? What encoding is that web page?
That will determine what encoding the form data is sent as.
|
 |
|
blvm
Name: blvm Posts: 3 |
to: Nordic, 2008-06-16 03:33:20
| reply!
> OK, outside of tests, how are queries submitted to the Sphinx API?
>
> Via a web page? What encoding is that web page?
>
> That will determine what encoding the form data is sent as.
Thanks for your answer,I tested via web page with php,and encoding is utf8.Those my
sphinx API,I use sphinx sample--- "test2.php".
test2.php
_________________________________
<?php
echo "begin";
include('sphinxapi.php');
echo "µç»°1";
$cl = new SphinxClient();
$cl->SetServer( "localhost", 3312 );
$cl->SetMatchMode( SPH_MATCH_ANY );
$cl->SetFilter( 'model', array( 3 ) );
$result = $cl->Query( 'µç»°', 'company' ); //company is index
if ( $result === false ) {
echo "Query failed: " . $cl->GetLastError() . ".\n";
}
else {
if ( $cl->GetLastWarning() ) {
echo "WARNING: " . $cl->GetLastWarning() . "";
}
if ( ! empty($result["matches"]) ) {
foreach ( $result["matches"] as $doc => $docinfo ) {
echo "$doc\n";
}
#var_dump($result);
print_r( $result );
}
}
?>
____________________________________________
I can get "µç»°1" when I echo it direct.But get nothing from sphinx,even some error
message.So, what'wrong with it?
Thank you very much!
|
 |
|
blvm
Name: blvm Posts: 3 |
to: blvm, 2008-06-16 03:52:36
| reply!
> > OK, outside of tests, how are queries submitted to the Sphinx API?
> >
> > Via a web page? What encoding is that web page?
> >
> > That will determine what encoding the form data is sent as.
>
>
> Thanks for your answer,I tested via web page with php,and encoding is utf8.Those my
> sphinx API,I use sphinx sample--- "test2.php".
>
> test2.php
> _________________________________
>
> <?php
>
> echo "begin";
>
> include('sphinxapi.php');
>
> echo "µç»°1";
> $cl = new SphinxClient();
> $cl->SetServer( "localhost", 3312 );
> $cl->SetMatchMode( SPH_MATCH_ANY );
> $cl->SetFilter( 'model', array( 3 ) );
>
> $result = $cl->Query( 'µç»°', 'company' ); //company is index
>
> if ( $result === false ) {
> echo "Query failed: " . $cl->GetLastError() . ".\n";
> }
> else {
> if ( $cl->GetLastWarning() ) {
> echo "WARNING: " . $cl->GetLastWarning() . "";
> }
>
> if ( ! empty($result["matches"]) ) {
> foreach ( $result["matches"] as $doc => $docinfo ) {
> echo "$doc\n";
> }
>
> #var_dump($result);
> print_r( $result );
> }
> }
> ?>
> ____________________________________________
>
> I can get "µç»°1" when I echo it direct.But get nothing from sphinx,even some error
> message.So, what'wrong with it?
>
> Thank you very much!
------------------------------------------------------
> plus ,I can get nothing in CLI too.such as : /usr/local/sphinx/bin/search 1
[root@saming /usr/local/sphinx/bin]# ./search 1
Sphinx 0.9.8-rc2 (r1234)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file '/usr/local/sphinx/etc/sphinx.conf'...
index 'test1': query '1 ': returned 0 matches of 0 total in 0.000 sec
words:
1. '1': 0 documents, 0 hits
index 'company': query '1 ': returned 0 matches of 0 total in 0.002 sec
words:
1. '1': 0 documents, 0 hits
Now what can I do ? :=(
>
>
>
>
>
>
>
|
 |
|
shodan
Name: Andrew Aksyonoff Posts: 4275 |
to: blvm, 2008-06-19 01:09:40
| reply!
> > $result = $cl->Query( 'µç»°', 'company' ); //company is index
...
> words:
> 1. '1': 0 documents, 0 hits
This means that all those other characters are getting stripped. The problem is most
likely with your charset_table. Double check it, reindex everything, restart searchd, etc.
|
 |
|
itguru
Name: Zeeshan Posts: 4 |
to: shodan, 2010-02-04 10:40:54
| reply!
Hi,
Am also not getting results for arabic. Here is my code in sphinx.
NOTE: My field is utf8bin and table collation is latin1_swedish_ci and i can not chnage
my table type. On using ngrams_chars I get this "ERROR: unknown key name 'ngrams_chars' "
source my_source
{
type = mysql
sql_host = localhost
sql_user = user
sql_pass = pass
sql_db = my_db
sql_sock = /path/mysql.sock
sql_port = 3306
sql_query_pre = SET CHARACTER_SET_RESULTS=utf8
sql_query_pre = SET NAMES utf8
sql_query = \
SELECT my_fields \
FROM mytable \
INNER JOIN table2 ON (mytable.field = table2.field)
}
index my_source
{
source = my_source
path = /path/my_source
min_prefix_len = 0
min_infix_len = 0
min_word_len = 1
charset_type = utf-8
charset_table = 0..9, a..z, _, A..Z->a..z,U+0622->U+0627, U+0623->U+0627, U+0624->U+0648,
U+0625->U+0627, U+0626->U+064A, U+06C0->U+06D5, U+06C2->U+06C1, U+06D3->U+06D2,
U+FB50->U+0671, U+FB51->U+0671, U+FB52->U+067B, U+FB53->U+067B, U+FB54->U+067B,
U+FB56->U+067E, U+FB57->U+067E, U+FB58->U+067E, U+FB5A->U+0680, U+FB5B->U+0680,
U+FB5C->U+0680, U+FB5E->U+067A, U+FB5F->U+067A, U+FB60->U+067A, U+FB62->U+067F,
U+FB63->U+067F, U+FB64->U+067F, U+FB66->U+0679, U+FB67->U+0679, U+FB68->U+0679,
U+FB6A->U+06A4, U+FB6B->U+06A4, U+FB6C->U+06A4, U+FB6E->U+06A6, U+FB6F->U+06A6,
U+FB70->U+06A6, U+FB72->U+0684, U+FB73->U+0684, U+FB74->U+0684, U+FB76->U+0683,
U+FB77->U+0683, U+FB78->U+0683, U+FB7A->U+0686, U+FB7B->U+0686, U+FB7C->U+0686,
U+FB7E->U+0687, U+FB7F->U+0687, U+FB80->U+0687, U+FB82->U+068D, U+FB83->U+068D,
U+FB84->U+068C, U+FB85->U+068C, U+FB86->U+068E, U+FB87->U+068E, U+FB88->U+0688,
U+FB89->U+0688, U+FB8A->U+0698, U+FB8B->U+0698, U+FB8C->U+0691, U+FB8D->U+0691,
U+FB8E->U+06A9, U+FB8F->U+06A9, U+FB90->U+06A9, U+FB92->U+06AF, U+FB93->U+06AF,
U+FB94->U+06AF, U+FB96->U+06B3, U+FB97->U+06B3, U+FB98->U+06B3, U+FB9A->U+06B1,
U+FB9B->U+06B1, U+FB9C->U+06B1, U+FB9E->U+06BA, U+FB9F->U+06BA, U+FBA0->U+06BB,
U+FBA1->U+06BB, U+FBA2->U+06BB, U+FBA4->U+06C0, U+FBA5->U+06C0, U+FBA6->U+06C1,
U+FBA7->U+06C1, U+FBA8->U+06C1, U+FBAA->U+06BE, U+FBAB->U+06BE, U+FBAC->U+06BE,
U+FBAE->U+06D2, U+FBAF->U+06D2, U+FBB0->U+06D3, U+FBB1->U+06D3, U+FBD3->U+06AD,
U+FBD4->U+06AD, U+FBD5->U+06AD, U+FBD7->U+06C7, U+FBD8->U+06C7, U+FBD9->U+06C6,
U+FBDA->U+06C6, U+FBDB->U+06C8, U+FBDC->U+06C8, U+FBDD->U+0677, U+FBDE->U+06CB,
U+FBDF->U+06CB, U+FBE0->U+06C5, U+FBE1->U+06C5, U+FBE2->U+06C9, U+FBE3->U+06C9,
U+FBE4->U+06D0, U+FBE5->U+06D0, U+FBE6->U+06D0, U+FBE8->U+0649, U+FBFC->U+06CC,
U+FBFD->U+06CC, U+FBFE->U+06CC, U+0621, U+0627..U+063A, U+0641..U+064A, U+0660..U+0669,
U+066E, U+066F, U+0671..U+06BF, U+06C1, U+06C3..U+06D2, U+06D5, U+06EE..U+06FC, U+06FF,
U+0750..U+076D, U+FB55, U+FB59, U+FB5D, U+FB61, U+FB65, U+FB69, U+FB6D, U+FB71, U+FB75,
U+FB79, U+FB7D, U+FB81, U+FB91, U+FB95, U+FB99, U+FB9D, U+FBA3, U+FBA9, U+FBAD, U+FBD6,
U+FBE7, U+FBE9, U+FBFF
ngram_len = 1
}
indexer
{
mem_limit = 1024M
max_iosize = 1248576
}
searchd
{
port = 3312
max_matches = 1000000
log = /path/searchd.log
query_log = /path/query.log
pid_file = /path/searchd.pid
}
|
 |
Common forum |
1 | 2 | 3 | 4 | 5 | ... |
273 | 274 | 275 | 276 | next »» | Create new thread
|