free open-source SQL full-text search engine

Need a hand?
+1-888-333-1345


Sphinx

Community

Services

Misc

 Subscribe in a reader

Tracked by ClickAider

Forums :: Register :: Login :: Forgot your password? :: Search

anonymous user


utf8 doesn't work for Chinese?

Common forum | 1 | 2 | 3 | 4 | 5 | ... | 273 | 274 | 275 | 276 | next »» | Create new thread

dayqz

Name: dayqz
Posts: 4

2006-12-25 08:04:39 | reply!


This is quite strange.
I compiled eveything from source. Then i set mysql encoding to utf8:
mysql> show variables like '%char%';
+--------------------------+----------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/share/mysql/charsets/ |
+--------------------------+----------------------------------+
I coded these lines in sphinx.conf
                  sql_query_pre = SET NAMES utf8
                charset_type = utf-8
Everything works fine for English.
But when I filled the table with Chinese sentences like this
(804, 1, NOW(), 'title', 'Ϊ ½¨ Éè Ò» ¸ö ºÍ ƽ ·¢ Õ¹ '),
(805, 1, NOW(), 'title', 'ÎÄ Ã÷ ½ø ²½ µÄ ÊÀ ½ç ¶ø ¼Ì Ðø Ŭ Á¦ ·Ü ¶· ')
£¬and then search with command:
/usr/local/bin/search µÄ
sphinx returns 0 result.

I think since I insert space between each Chinese Glyph, there should be no problem for
sphinx to index each record.

Can anybody point out which point I missed?
Thank you.

dayqz

Name: dayqz
Posts: 4

to: dayqz, 2006-12-25 08:10:15 | reply!


plus:
mysql Server version: 5.1.14-beta Source distribution
Sphinx 0.9.7-rc2
> This is quite strange.
> I compiled eveything from source. Then i set mysql encoding to utf8:
> mysql> show variables like '%char%';
> +--------------------------+----------------------------------+
> | Variable_name | Value |
> +--------------------------+----------------------------------+
> | character_set_client | utf8 |
> | character_set_connection | utf8 |
> | character_set_database | utf8 |
> | character_set_filesystem | binary |
> | character_set_results | utf8 |
> | character_set_server | utf8 |
> | character_set_system | utf8 |
> | character_sets_dir | /usr/local/share/mysql/charsets/ |
> +--------------------------+----------------------------------+
> I coded these lines in sphinx.conf
> sql_query_pre = SET NAMES utf8
> charset_type = utf-8
> Everything works fine for English.
> But when I filled the table with Chinese sentences like this
> (804, 1, NOW(), 'title', 'Ϊ ½¨ Éè Ò» ¸ö ºÍ ƽ ·¢ Õ¹ '),
> (805, 1, NOW(), 'title', 'ÎÄ Ã÷ ½ø ²½ µÄ ÊÀ ½ç ¶ø ¼Ì Ðø Ŭ Á¦ ·Ü ¶· ')
> £¬and then search with command:
> /usr/local/bin/search µÄ
> sphinx returns 0 result.
>
> I think since I insert space between each Chinese Glyph, there should be no problem for
> sphinx to index each record.
>
> Can anybody point out which point I missed?
> Thank you.

Nordic

Posts: 299

to: dayqz, 2006-12-26 11:19:38 | reply!


> plus:
> mysql Server version: 5.1.14-beta Source distribution
> Sphinx 0.9.7-rc2

It's because of the issue of CJK languages and their glyphs.

There are very few spaces in CJK; it's not an English-like language with regular spaces
separating individual words. This means a string of CJK characters, no matter how long
the sequence – is seen as a word in the eyes of Sphinx.

The only way I see Andrew fixing this is if during indexing he separates characters in
the CJK Unicode spec ranges and indexes each individual glyph as a separate word.

With some C++ knowledge you could take a look at the Sphinx source yourself and give it a
shot. This is only a moderate solution however, systems like Google etc have a better
knowledge of the dictionaries of such languages and thus know much more about the
language rules, grammar etc.

You would also need to alter the charset_table in sphinx.conf.

Here’s one I’ve created, it includes Arabic, accented Latin and CJK ranges I think. I
can’t remember exactly which Unicode ranges I enabled, lol:

index common {
        min_word_len = 1
        charset_type = utf-8
        charset_table = U+FF10..U+FF19->0..9, 0..9, U+FF41..U+FF5A->a..z, U+FF21..U+FF3A->a..z,
        A..Z->a..z, a..z, U+0149, U+017F, U+0138, U+00DF, U+00FF, U+00C0..U+00D6->U+00E0..U+00F6,
        U+00E0..U+00F6, U+00D8..U+00DE->U+00F8..U+00FE, U+00F8..U+00FE, U+0100->U+0101, U+0101,
        U+0102->U+0103, U+0103, U+0104->U+0105, U+0105, U+0106->U+0107, U+0107, U+0108->U+0109,
        U+0109, U+010A->U+010B, U+010B, U+010C->U+010D, U+010D, U+010E->U+010F, U+010F,
        U+0110->U+0111, U+0111, U+0112->U+0113, U+0113, U+0114->U+0115, U+0115, U+0116->U+0117,
        U+0117, U+0118->U+0119, U+0119, U+011A->U+011B, U+011B, U+011C->U+011D, U+011D,
        U+011E->U+011F, U+011F, U+0130->U+0131, U+0131, U+0132->U+0133, U+0133, U+0134->U+0135,
        U+0135, U+0136->U+0137, U+0137, U+0139->U+013A, U+013A, U+013B->U+013C, U+013C,
        U+013D->U+013E, U+013E, U+013F->U+0140, U+0140, U+0141->U+0142, U+0142, U+0143->U+0144,
        U+0144, U+0145->U+0146, U+0146, U+0147->U+0148, U+0148, U+014A->U+014B, U+014B,
        U+014C->U+014D, U+014D, U+014E->U+014F, U+014F, U+0150->U+0151, U+0151, U+0152->U+0153,
        U+0153, U+0154->U+0155, U+0155, U+0156->U+0157, U+0157, U+0158->U+0159, U+0159,
        U+015A->U+015B, U+015B, U+015C->U+015D, U+015D, U+015E->U+015F, U+015F, U+0160->U+0161,
        U+0161, U+0162->U+0163, U+0163, U+0164->U+0165, U+0165, U+0166->U+0167, U+0167,
        U+0168->U+0169, U+0169, U+016A->U+016B, U+016B, U+016C->U+016D, U+016D, U+016E->U+016F,
        U+016F, U+0170->U+0171, U+0171, U+0172->U+0173, U+0173, U+0174->U+0175, U+0175,
        U+0176->U+0177, U+0177, U+0178->U+00FF, U+00FF, U+0179->U+017A, U+017A, U+017B->U+017C,
        U+017C, U+017D->U+017E, U+017E, U+4E00..U+9FFF
        docinfo = extern
        morphology = none
}

shodan

Name: Andrew Aksyonoff
Posts: 4275

to: dayqz, 2006-12-27 03:24:42 | reply!


> I think since I insert space between each Chinese Glyph, there should be no problem for
> sphinx to index each record.

Yes, but you also need to add Chinese UTF-8 codes range to charset_table - so that Sphinx
would treat them as significant characters.

dayqz

Name: dayqz
Posts: 4

to: Nordic, 2006-12-27 09:32:19 | reply!


Thank you very much!
It worked!

> > plus:
> > mysql Server version: 5.1.14-beta Source distribution
> > Sphinx 0.9.7-rc2
>
> It's because of the issue of CJK languages and their glyphs.
>
> There are very few spaces in CJK; it's not an English-like language with regular spaces
> separating individual words. This means a string of CJK characters, no matter how long
> the sequence ?is seen as a word in the eyes of Sphinx.
>
> The only way I see Andrew fixing this is if during indexing he separates characters in
> the CJK Unicode spec ranges and indexes each individual glyph as a separate word.
>
> With some C++ knowledge you could take a look at the Sphinx source yourself and give it a
> shot. This is only a moderate solution however, systems like Google etc have a better
> knowledge of the dictionaries of such languages and thus know much more about the
> language rules, grammar etc.
>
> You would also need to alter the charset_table in sphinx.conf.
>
> Here’s one I’ve created, it includes Arabic, accented Latin and CJK ranges I think. I
> can’t remember exactly which Unicode ranges I enabled, lol:
>
> index common {
> min_word_len = 1
> charset_type = utf-8
> charset_table = U+FF10..U+FF19->0..9, 0..9, U+FF41..U+FF5A->a..z, U+FF21..U+FF3A->a..z,
> A..Z->a..z, a..z, U+0149, U+017F, U+0138, U+00DF, U+00FF, U+00C0..U+00D6->U+00E0..U+00F6,
> U+00E0..U+00F6, U+00D8..U+00DE->U+00F8..U+00FE, U+00F8..U+00FE, U+0100->U+0101, U+0101,
> U+0102->U+0103, U+0103, U+0104->U+0105, U+0105, U+0106->U+0107, U+0107, U+0108->U+0109,
> U+0109, U+010A->U+010B, U+010B, U+010C->U+010D, U+010D, U+010E->U+010F, U+010F,
> U+0110->U+0111, U+0111, U+0112->U+0113, U+0113, U+0114->U+0115, U+0115, U+0116->U+0117,
> U+0117, U+0118->U+0119, U+0119, U+011A->U+011B, U+011B, U+011C->U+011D, U+011D,
> U+011E->U+011F, U+011F, U+0130->U+0131, U+0131, U+0132->U+0133, U+0133, U+0134->U+0135,
> U+0135, U+0136->U+0137, U+0137, U+0139->U+013A, U+013A, U+013B->U+013C, U+013C,
> U+013D->U+013E, U+013E, U+013F->U+0140, U+0140, U+0141->U+0142, U+0142, U+0143->U+0144,
> U+0144, U+0145->U+0146, U+0146, U+0147->U+0148, U+0148, U+014A->U+014B, U+014B,
> U+014C->U+014D, U+014D, U+014E->U+014F, U+014F, U+0150->U+0151, U+0151, U+0152->U+0153,
> U+0153, U+0154->U+0155, U+0155, U+0156->U+0157, U+0157, U+0158->U+0159, U+0159,
> U+015A->U+015B, U+015B, U+015C->U+015D, U+015D, U+015E->U+015F, U+015F, U+0160->U+0161,
> U+0161, U+0162->U+0163, U+0163, U+0164->U+0165, U+0165, U+0166->U+0167, U+0167,
> U+0168->U+0169, U+0169, U+016A->U+016B, U+016B, U+016C->U+016D, U+016D, U+016E->U+016F,
> U+016F, U+0170->U+0171, U+0171, U+0172->U+0173, U+0173, U+0174->U+0175, U+0175,
> U+0176->U+0177, U+0177, U+0178->U+00FF, U+00FF, U+0179->U+017A, U+017A, U+017B->U+017C,
> U+017C, U+017D->U+017E, U+017E, U+4E00..U+9FFF
> docinfo = extern
> morphology = none
> }
>

dayqz

Name: dayqz
Posts: 4

to: shodan, 2006-12-27 09:39:09 | reply!


Thank you for your ultra fast search engine :)
> > I think since I insert space between each Chinese Glyph, there should be no problem for
> sphinx to index each record.
>
> Yes, but you also need to add Chinese UTF-8 codes range to charset_table - so that Sphinx
> would treat them as significant characters.

murion

Name: Paul
Posts: 2

to: Nordic, 2007-01-30 00:10:47 | reply!


> Here’s one I’ve created, it includes Arabic, accented Latin and CJK ranges I think. I
> can’t remember exactly which Unicode ranges I enabled, lol:
>
> index common {
> min_word_len = 1
> charset_type = utf-8
> charset_table = U+FF10..U+FF19->0..9, 0..9, U+FF41..U+FF5A->a..z, U+FF21..U+FF3A->a..z,
> A..Z->a..z, a..z, U+0149, U+017F, U+0138, U+00DF, U+00FF, U+00C0..U+00D6->U+00E0..U+00F6,
> U+00E0..U+00F6, U+00D8..U+00DE->U+00F8..U+00FE, U+00F8..U+00FE, U+0100->U+0101, U+0101,
> U+0102->U+0103, U+0103, U+0104->U+0105, U+0105, U+0106->U+0107, U+0107, U+0108->U+0109,
> U+0109, U+010A->U+010B, U+010B, U+010C->U+010D, U+010D, U+010E->U+010F, U+010F,
> U+0110->U+0111, U+0111, U+0112->U+0113, U+0113, U+0114->U+0115, U+0115, U+0116->U+0117,
> U+0117, U+0118->U+0119, U+0119, U+011A->U+011B, U+011B, U+011C->U+011D, U+011D,
> U+011E->U+011F, U+011F, U+0130->U+0131, U+0131, U+0132->U+0133, U+0133, U+0134->U+0135,
> U+0135, U+0136->U+0137, U+0137, U+0139->U+013A, U+013A, U+013B->U+013C, U+013C,
> U+013D->U+013E, U+013E, U+013F->U+0140, U+0140, U+0141->U+0142, U+0142, U+0143->U+0144,
> U+0144, U+0145->U+0146, U+0146, U+0147->U+0148, U+0148, U+014A->U+014B, U+014B,
> U+014C->U+014D, U+014D, U+014E->U+014F, U+014F, U+0150->U+0151, U+0151, U+0152->U+0153,
> U+0153, U+0154->U+0155, U+0155, U+0156->U+0157, U+0157, U+0158->U+0159, U+0159,
> U+015A->U+015B, U+015B, U+015C->U+015D, U+015D, U+015E->U+015F, U+015F, U+0160->U+0161,
> U+0161, U+0162->U+0163, U+0163, U+0164->U+0165, U+0165, U+0166->U+0167, U+0167,
> U+0168->U+0169, U+0169, U+016A->U+016B, U+016B, U+016C->U+016D, U+016D, U+016E->U+016F,
> U+016F, U+0170->U+0171, U+0171, U+0172->U+0173, U+0173, U+0174->U+0175, U+0175,
> U+0176->U+0177, U+0177, U+0178->U+00FF, U+00FF, U+0179->U+017A, U+017A, U+017B->U+017C,
> U+017C, U+017D->U+017E, U+017E, U+4E00..U+9FFF
> docinfo = extern
> morphology = none
> }

This works great for me too, with the PHP API as well, for Chinese characters.

I added U+3000..U+30FF to deal with Japanese characters, and the 'search' utility works,
but not the PHP API. My query.log shows the right query string, but the dump on the
result array shows a broken UTF-8 character as the ['words'] array key, and no matches.

Any ideas?

murion

Name: Paul
Posts: 2

to: murion, 2007-01-30 14:59:14 | reply!


> This works great for me too, with the PHP API as well, for Chinese characters.
>
> I added U+3000..U+30FF to deal with Japanese characters, and the 'search' utility works,
> but not the PHP API. My query.log shows the right query string, but the dump on the
> result array shows a broken UTF-8 character as the ['words'] array key, and no matches.
>
> Any ideas?

It works great actually. I used the --rotate switch while re-indexing which was the
problem. Restarted searched and it all works.

shodan

Name: Andrew Aksyonoff
Posts: 4275

to: murion, 2007-01-30 17:33:52 | reply!


> Restarted searched and it all works.

I suppose that the actual issue was that you changed config file while searchd was
running. Using --rotate makes it pick up new date only; *not* the config changes yet.

blvm

Name: blvm
Posts: 3

to: dayqz, 2008-06-14 09:47:48 | reply!


> This is quite strange.
> I compiled eveything from source. Then i set mysql encoding to utf8:
> mysql> show variables like '%char%';
> +--------------------------+----------------------------------+
> | Variable_name | Value |
> +--------------------------+----------------------------------+
> | character_set_client | utf8 |
> | character_set_connection | utf8 |
> | character_set_database | utf8 |
> | character_set_filesystem | binary |
> | character_set_results | utf8 |
> | character_set_server | utf8 |
> | character_set_system | utf8 |
> | character_sets_dir | /usr/local/share/mysql/charsets/ |
> +--------------------------+----------------------------------+
> I coded these lines in sphinx.conf
> sql_query_pre = SET NAMES utf8
> charset_type = utf-8
> Everything works fine for English.
> But when I filled the table with Chinese sentences like this
> (804, 1, NOW(), 'title', 'Ϊ ½¨ Éè Ò» ¸ö ºÍ ƽ ·¢ Õ¹ '),
> (805, 1, NOW(), 'title', 'ÎÄ Ã÷ ½ø ²½ µÄ ÊÀ ½ç ¶ø ¼Ì Ðø Ŭ Á¦ ·Ü ¶· ')
> £¬and then search with command:
> /usr/local/bin/search µÄ
> sphinx returns 0 result.
>
> I think since I insert space between each Chinese Glyph, there should be no problem for
> sphinx to index each record.
>
> Can anybody point out which point I missed?
> Thank you.


hello dayqz:

I have some problem of sphinx in work for Chinense,and I copy your scripts(utf-8
character-table) into sphinx.conf. rebuild index,and no error .But when I search for
words ,it return no matched.problem as your's;

    plus,when I use sample to test,it's worked.

I am sorry ,my English ability is bad! so ,I copy my conf in posts;^_^

Thank you!

______________________________________

sphinx.conf


#########
#........
#########

source companysrc
{
        type = mysql
        sql_host = localhost
        sql_user = test
        sql_pass = test
        sql_db = test
        sql_port = 3306 # optional, default is 3306

        sql_query_pre = SET NAMES utf8
        sql_query = select co_id,co_name,co_person,co_do,co_address,co_phone,co_code from company

        sql_attr_uint = co_id
        sql_attr_str2ordinal = co_name
        sql_attr_str2ordinal = co_address
        sql_attr_str2ordinal = co_do
        sql_attr_str2ordinal = co_person
        sql_attr_str2ordinal = co_code
        sql_attr_str2ordinal = co_phone
        sql_ranged_throttle = 0

}

index company
{
        source = companysrc
        path = /usr/local/sphinx/var/data/company
        docinfo = extern

        mlock = 0
        morphology = none
        min_word_len = 1
        charset_type = utf-8
        html_strip = 0

        charset_table = U+FF10..U+FF19->0..9, 0..9, U+FF41..U+FF5A->a..z, U+FF21..U+FF3A->a..z, \
                A..Z->a..z, a..z, U+0149, U+017F, U+0138, U+00DF, U+00FF, U+00C0..U+00D6->U+00E0..U+00F6,
                \
                U+00E0..U+00F6, U+00D8..U+00DE->U+00F8..U+00FE, U+00F8..U+00FE, U+0100->U+0101, U+0101, \
                U+0102->U+0103, U+0103, U+0104->U+0105, U+0105, U+0106->U+0107, U+0107, U+0108->U+0109, \
                U+0109, U+010A->U+010B, U+010B, U+010C->U+010D, U+010D, U+010E->U+010F, U+010F, \
                U+0110->U+0111, U+0111, U+0112->U+0113, U+0113, U+0114->U+0115, U+0115, U+0116->U+0117,\
                U+0117, U+0118->U+0119, U+0119, U+011A->U+011B, U+011B, U+011C->U+011D, U+011D,\
                U+011E->U+011F, U+011F, U+0130->U+0131, U+0131, U+0132->U+0133, U+0133, U+0134->U+0135,\
                U+0135, U+0136->U+0137, U+0137, U+0139->U+013A, U+013A, U+013B->U+013C, U+013C,\
                U+013D->U+013E, U+013E, U+013F->U+0140, U+0140, U+0141->U+0142, U+0142, U+0143->U+0144,\
                U+0144, U+0145->U+0146, U+0146, U+0147->U+0148, U+0148, U+014A->U+014B, U+014B,\
                U+014C->U+014D, U+014D, U+014E->U+014F, U+014F, U+0150->U+0151, U+0151, U+0152->U+0153,\
                U+0153, U+0154->U+0155, U+0155, U+0156->U+0157, U+0157, U+0158->U+0159, U+0159,\
                U+015A->U+015B, U+015B, U+015C->U+015D, U+015D, U+015E->U+015F, U+015F, U+0160->U+0161,\
                U+0161, U+0162->U+0163, U+0163, U+0164->U+0165, U+0165, U+0166->U+0167, U+0167,\
                U+0168->U+0169, U+0169, U+016A->U+016B, U+016B, U+016C->U+016D, U+016D, U+016E->U+016F,\
                U+016F, U+0170->U+0171, U+0171, U+0172->U+0173, U+0173, U+0174->U+0175, U+0175,\
                U+0176->U+0177, U+0177, U+0178->U+00FF, U+00FF, U+0179->U+017A, U+017A, U+017B->U+017C,\
                U+017C, U+017D->U+017E, U+017E, U+4E00..U+9FFF


}

############
#...........
###########

________________________________________

mysql character


mysql> show variables like '%char%';
+--------------------------+----------------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/mysql/charsets/ |
+--------------------------+----------------------------------------+

________________________________

mysql table "company" struct

mysql> desc company;
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| co_id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| co_name | varchar(50) | NO | | | |
| co_person | varchar(50) | NO | | | |
| co_do | varchar(500) | NO | | | |
| co_address | varchar(50) | NO | | | |
| co_phone | varchar(30) | NO | | | |
| co_code | varchar(10) | NO | | | |
+------------+------------------+------+-----+---------+----------------+


______________________________

server:

freebsd 6.2 +mysql-5.0.45+sphinx-0.9.8-rc2

Nordic

Posts: 299

to: blvm, 2008-06-14 11:53:40 | reply!


OK, outside of tests, how are queries submitted to the Sphinx API?

Via a web page? What encoding is that web page?

That will determine what encoding the form data is sent as.

blvm

Name: blvm
Posts: 3

to: Nordic, 2008-06-16 03:33:20 | reply!


> OK, outside of tests, how are queries submitted to the Sphinx API?
>
> Via a web page? What encoding is that web page?
>
> That will determine what encoding the form data is sent as.


Thanks for your answer,I tested via web page with php,and encoding is utf8.Those my
sphinx API,I use sphinx sample--- "test2.php".

test2.php
_________________________________

<?php

echo "begin";

  include('sphinxapi.php');

echo "µç»°1";
    $cl = new SphinxClient();
    $cl->SetServer( "localhost", 3312 );
    $cl->SetMatchMode( SPH_MATCH_ANY );
    $cl->SetFilter( 'model', array( 3 ) );

    $result = $cl->Query( 'µç»°', 'company' ); //company is index

    if ( $result === false ) {
            echo "Query failed: " . $cl->GetLastError() . ".\n";
    }
    else {
            if ( $cl->GetLastWarning() ) {
                    echo "WARNING: " . $cl->GetLastWarning() . "";
            }

            if ( ! empty($result["matches"]) ) {
                    foreach ( $result["matches"] as $doc => $docinfo ) {
                                echo "$doc\n";
                    }

                    #var_dump($result);
                    print_r( $result );
            }
    }
?>
____________________________________________

I can get "µç»°1" when I echo it direct.But get nothing from sphinx,even some error
message.So, what'wrong with it?

Thank you very much!

blvm

Name: blvm
Posts: 3

to: blvm, 2008-06-16 03:52:36 | reply!


> > OK, outside of tests, how are queries submitted to the Sphinx API?
> >
> > Via a web page? What encoding is that web page?
> >
> > That will determine what encoding the form data is sent as.
>
>
> Thanks for your answer,I tested via web page with php,and encoding is utf8.Those my
> sphinx API,I use sphinx sample--- "test2.php".
>
> test2.php
> _________________________________
>
> <?php
>
> echo "begin";
>
> include('sphinxapi.php');
>
> echo "µç»°1";
> $cl = new SphinxClient();
> $cl->SetServer( "localhost", 3312 );
> $cl->SetMatchMode( SPH_MATCH_ANY );
> $cl->SetFilter( 'model', array( 3 ) );
>
> $result = $cl->Query( 'µç»°', 'company' ); //company is index
>
> if ( $result === false ) {
> echo "Query failed: " . $cl->GetLastError() . ".\n";
> }
> else {
> if ( $cl->GetLastWarning() ) {
> echo "WARNING: " . $cl->GetLastWarning() . "";
> }
>
> if ( ! empty($result["matches"]) ) {
> foreach ( $result["matches"] as $doc => $docinfo ) {
> echo "$doc\n";
> }
>
> #var_dump($result);
> print_r( $result );
> }
> }
> ?>
> ____________________________________________
>
> I can get "µç»°1" when I echo it direct.But get nothing from sphinx,even some error
> message.So, what'wrong with it?
>
> Thank you very much!
------------------------------------------------------

> plus ,I can get nothing in CLI too.such as : /usr/local/sphinx/bin/search 1

[root@saming /usr/local/sphinx/bin]# ./search 1
Sphinx 0.9.8-rc2 (r1234)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file '/usr/local/sphinx/etc/sphinx.conf'...
index 'test1': query '1 ': returned 0 matches of 0 total in 0.000 sec

words:
1. '1': 0 documents, 0 hits

index 'company': query '1 ': returned 0 matches of 0 total in 0.002 sec

words:
1. '1': 0 documents, 0 hits

Now what can I do ? :=(


>
>
>
>
>
>
>

shodan

Name: Andrew Aksyonoff
Posts: 4275

to: blvm, 2008-06-19 01:09:40 | reply!


> > $result = $cl->Query( 'µç»°', 'company' ); //company is index
...
> words:
> 1. '1': 0 documents, 0 hits

This means that all those other characters are getting stripped. The problem is most
likely with your charset_table. Double check it, reindex everything, restart searchd, etc.

itguru

Name: Zeeshan
Posts: 4

to: shodan, 2010-02-04 10:40:54 | reply!


Hi,
Am also not getting results for arabic. Here is my code in sphinx.
NOTE: My field is utf8bin and table collation is latin1_swedish_ci and i can not chnage
my table type. On using ngrams_chars I get this "ERROR: unknown key name 'ngrams_chars' "

source my_source
{
                type = mysql
                sql_host = localhost
                sql_user = user
                sql_pass = pass
                sql_db = my_db
                sql_sock = /path/mysql.sock
                sql_port = 3306

                sql_query_pre = SET CHARACTER_SET_RESULTS=utf8
                sql_query_pre = SET NAMES utf8
                sql_query = \
                                SELECT my_fields \
                                FROM mytable \
                                INNER JOIN table2 ON (mytable.field = table2.field)
}
index my_source
{
                source = my_source
                path = /path/my_source
                min_prefix_len = 0
                min_infix_len = 0
                min_word_len = 1
                charset_type = utf-8
                charset_table = 0..9, a..z, _, A..Z->a..z,U+0622->U+0627, U+0623->U+0627, U+0624->U+0648,
                U+0625->U+0627, U+0626->U+064A, U+06C0->U+06D5, U+06C2->U+06C1, U+06D3->U+06D2,
                U+FB50->U+0671, U+FB51->U+0671, U+FB52->U+067B, U+FB53->U+067B, U+FB54->U+067B,
                U+FB56->U+067E, U+FB57->U+067E, U+FB58->U+067E, U+FB5A->U+0680, U+FB5B->U+0680,
                U+FB5C->U+0680, U+FB5E->U+067A, U+FB5F->U+067A, U+FB60->U+067A, U+FB62->U+067F,
                U+FB63->U+067F, U+FB64->U+067F, U+FB66->U+0679, U+FB67->U+0679, U+FB68->U+0679,
                U+FB6A->U+06A4, U+FB6B->U+06A4, U+FB6C->U+06A4, U+FB6E->U+06A6, U+FB6F->U+06A6,
                U+FB70->U+06A6, U+FB72->U+0684, U+FB73->U+0684, U+FB74->U+0684, U+FB76->U+0683,
                U+FB77->U+0683, U+FB78->U+0683, U+FB7A->U+0686, U+FB7B->U+0686, U+FB7C->U+0686,
                U+FB7E->U+0687, U+FB7F->U+0687, U+FB80->U+0687, U+FB82->U+068D, U+FB83->U+068D,
                U+FB84->U+068C, U+FB85->U+068C, U+FB86->U+068E, U+FB87->U+068E, U+FB88->U+0688,
                U+FB89->U+0688, U+FB8A->U+0698, U+FB8B->U+0698, U+FB8C->U+0691, U+FB8D->U+0691,
                U+FB8E->U+06A9, U+FB8F->U+06A9, U+FB90->U+06A9, U+FB92->U+06AF, U+FB93->U+06AF,
                U+FB94->U+06AF, U+FB96->U+06B3, U+FB97->U+06B3, U+FB98->U+06B3, U+FB9A->U+06B1,
                U+FB9B->U+06B1, U+FB9C->U+06B1, U+FB9E->U+06BA, U+FB9F->U+06BA, U+FBA0->U+06BB,
                U+FBA1->U+06BB, U+FBA2->U+06BB, U+FBA4->U+06C0, U+FBA5->U+06C0, U+FBA6->U+06C1,
                U+FBA7->U+06C1, U+FBA8->U+06C1, U+FBAA->U+06BE, U+FBAB->U+06BE, U+FBAC->U+06BE,
                U+FBAE->U+06D2, U+FBAF->U+06D2, U+FBB0->U+06D3, U+FBB1->U+06D3, U+FBD3->U+06AD,
                U+FBD4->U+06AD, U+FBD5->U+06AD, U+FBD7->U+06C7, U+FBD8->U+06C7, U+FBD9->U+06C6,
                U+FBDA->U+06C6, U+FBDB->U+06C8, U+FBDC->U+06C8, U+FBDD->U+0677, U+FBDE->U+06CB,
                U+FBDF->U+06CB, U+FBE0->U+06C5, U+FBE1->U+06C5, U+FBE2->U+06C9, U+FBE3->U+06C9,
                U+FBE4->U+06D0, U+FBE5->U+06D0, U+FBE6->U+06D0, U+FBE8->U+0649, U+FBFC->U+06CC,
                U+FBFD->U+06CC, U+FBFE->U+06CC, U+0621, U+0627..U+063A, U+0641..U+064A, U+0660..U+0669,
                U+066E, U+066F, U+0671..U+06BF, U+06C1, U+06C3..U+06D2, U+06D5, U+06EE..U+06FC, U+06FF,
                U+0750..U+076D, U+FB55, U+FB59, U+FB5D, U+FB61, U+FB65, U+FB69, U+FB6D, U+FB71, U+FB75,
                U+FB79, U+FB7D, U+FB81, U+FB91, U+FB95, U+FB99, U+FB9D, U+FBA3, U+FBA9, U+FBAD, U+FBD6,
                U+FBE7, U+FBE9, U+FBFF

ngram_len = 1
}
indexer
{
                                mem_limit = 1024M
                                max_iosize = 1248576
}
searchd
{
                                port = 3312
                                max_matches = 1000000
                                log = /path/searchd.log
                                query_log = /path/query.log
                                pid_file = /path/searchd.pid
}

Common forum | 1 | 2 | 3 | 4 | 5 | ... | 273 | 274 | 275 | 276 | next »» | Create new thread


Copyright © Sphinx Technologies Inc, 2009