|
Sphinx
Community
Services
Misc
Subscribe in a reader
|
$cl->BuildExcerpts doesn't work for russian text.
Common forum |
1 | 2 | 3 | 4 | 5 | ... |
263 | 264 | 265 | 266 | next »» | Create new thread
|
ethaniel
Name: Ethaniel Posts: 30 |
2006-08-26 17:52:29
| reply!
$cl->BuildExcerpts doesn't work when we input russian CP-1251 text.
It just removes all the russian characters.
Converting to UTF or KOI8 doesn't help either.
|
|
shodan
Name: Andrew Aksyonoff Posts: 4117 |
to: ethaniel, 2006-08-27 01:36:30
| reply!
> Converting to UTF or KOI8 doesn't help either.
It doesn't support encoding other than UTF-8, but UTF-8 really should work. Are you
positive you convert both $docs and query $words to UTF-8?
|
 |
|
ethaniel
Name: Ethaniel Posts: 30 |
to: shodan, 2006-08-27 07:02:54
| reply!
> > Converting to UTF or KOI8 doesn't help either.
>
> It doesn't support encoding other than UTF-8, but UTF-8 really should work. Are you
> positive you convert both $docs and query $words to UTF-8?
first of all I would like to thank you for this wonderful program. It is just what I
wanted to create for a long long time. Now it will really help me out.
Now regarding UTF. I did convert to docs and query to UTF.
my opts are
$opts = array
(
"before_match" => "<b>",
"after_match" => "</b>",
"chunk_separator" => " ... ",
"limit" => 400,
"around" => 3
);
it returns the same text I enter, it doesn't enclose the query with <b></b>.
http://search.nightparty.ru/np.php
|
|
shodan
Name: Andrew Aksyonoff Posts: 4117 |
to: ethaniel, 2006-08-27 09:27:19
| reply!
> Now regarding UTF. I did convert to docs and query to UTF.
Managed to reproduced that on one of my servers. Will check and fix, thanks for the
report!
|
|
ethaniel
Name: Ethaniel Posts: 30 |
to: shodan, 2006-08-28 16:55:01
| reply!
> > Now regarding UTF. I did convert to docs and query to UTF.
>
> Managed to reproduced that on one of my servers. Will check and fix, thanks for the
> report!
can't wait for the new version.
|
|
shodan
Name: Andrew Aksyonoff Posts: 4117 |
to: shodan, 2006-08-29 11:41:01
| reply!
> Managed to reproduced that on one of my servers.
It turns out that charset_table for the index was configured to use SBCS encoding - so
excerpts code picked it and, obviously, failed - as it only supports UTF-8 at the moment.
To workaround with 0.9.6, you would either use UTF-8 everywhere - or setup a fake index
with UTF-8 encoding and proper table, and use this fake index for excerpts generation
only.
I scheduled to add SBCS support to exceprts generator, will be fixed in some next release.
|
 |
|
ethaniel
Name: Ethaniel Posts: 30 |
to: shodan, 2006-09-02 16:45:29
| reply!
> > Managed to reproduced that on one of my servers.
>
> It turns out that charset_table for the index was configured to use SBCS encoding - so
> excerpts code picked it and, obviously, failed - as it only supports UTF-8 at the moment.
>
> To workaround with 0.9.6, you would either use UTF-8 everywhere - or setup a fake index
> with UTF-8 encoding and proper table, and use this fake index for excerpts generation
> only.
>
> I scheduled to add SBCS support to exceprts generator, will be fixed in some next release.
my dbs are cp1251. mysql 4.0.24 (no collation or stuff like that).
I set utf-8 in the config file, reindexed and now the search is returning zero results.
any ideas? this fix is rather important.
|
 |
|
shodan
Name: Andrew Aksyonoff Posts: 4117 |
to: ethaniel, 2006-09-03 19:02:10
| reply!
> my dbs are cp1251. mysql 4.0.24 (no collation or stuff like that).
>
> I set utf-8 in the config file, reindexed and now the search is returning zero results.
If Sphinx expects UTF-8, you need to make MySQL provide UTF-8 encoded data to Sphinx when
indexing as well.
Something like sql_query_pre = SET CHARACTER_SET_RESULTS UTF-8 should help.
|
|
shodan
Name: Andrew Aksyonoff Posts: 4117 |
to: shodan, 2006-09-04 06:34:54
| reply!
> Something like sql_query_pre = SET CHARACTER_SET_RESULTS UTF-8 should help.
I've been just told that 4.0.24 does not support UTF-8.
In this case, you'll have to setup main Sphinx index to use cp-1251 (and query it in
cp-1251) and a fake index to generate excerpts in UTF-8 (and pass document data and query
in UTF-8).
|
|
ethaniel
Name: Ethaniel Posts: 30 |
to: shodan, 2006-09-04 14:36:10
| reply!
> > Something like sql_query_pre = SET CHARACTER_SET_RESULTS UTF-8 should help.
>
> I've been just told that 4.0.24 does not support UTF-8.
>
> In this case, you'll have to setup main Sphinx index to use cp-1251 (and query it in
> cp-1251) and a fake index to generate excerpts in UTF-8 (and pass document data and query
> in UTF-8).
thanks alot, I guess that should work.
When will you release the main fix for this problem? I'd love to use your system in
production.
|
 |
|
ethaniel
Name: Ethaniel Posts: 30 |
to: ethaniel, 2006-09-05 10:09:57
| reply!
> thanks alot, I guess that should work.
> When will you release the main fix for this problem? I'd love to use your system in
> production.
it didn't work.
$text=array(win2utf($text));
$res = $cl->BuildExcerpts ( $text, "utf8", win2utf($q), $opts );
$res is empty.
I use the following function:
function win2utf($s){
$c209 = chr(209); $c208 = chr(208); $c129 = chr(129);
for($i=0; $i<strlen($s); $i++) {
$c=ord($s[$i]);
if ($c>=192 and $c<=239) $t.=$c208.chr($c-48);
elseif ($c>239) $t.=$c209.chr($c-112);
elseif ($c==184) $t.=$c209.$c209;
elseif ($c==168) $t.=$c208.$c129;
else $t.=$s[$i];
}
return $t;
}
|
 |
|
ethaniel
Name: Ethaniel Posts: 30 |
to: ethaniel, 2006-09-05 10:25:12
| reply!
> > thanks alot, I guess that should work.
> > When will you release the main fix for this problem? I'd love to use your system in
> production.
>
> it didn't work.
>
> $text=array(win2utf($text));
> $res = $cl->BuildExcerpts ( $text, "utf8", win2utf($q), $opts );
>
> $res is empty.
>
> I use the following function:
> function win2utf($s){
> $c209 = chr(209); $c208 = chr(208); $c129 = chr(129);
> for($i=0; $i<strlen($s); $i++) {
> $c=ord($s[$i]);
> if ($c>=192 and $c<=239) $t.=$c208.chr($c-48);
> elseif ($c>239) $t.=$c209.chr($c-112);
> elseif ($c==184) $t.=$c209.$c209;
> elseif ($c==168) $t.=$c208.$c129;
> else $t.=$s[$i];
> }
> return $t;
> }
>
>
PLEASE DISREGARD THIS COMMENT. I FORGOT TO RESTART searchd.
Now there is an additional problem. For example i query "blagodaru" in russian.
the search returns all results including "blagodara" (which is correct too).
but the BuildExcerpts doesn't select the "blagodara" with <b></b>.
(I'm in UTF mode).
|
 |
|
shodan
Name: Andrew Aksyonoff Posts: 4117 |
to: ethaniel, 2006-09-05 12:06:29
| reply!
> the search returns all results including "blagodara" (which is correct too).
> but the BuildExcerpts doesn't select the "blagodara" with <b></b>.
This is another feature missing from excerpts generator: as of 0.9.6, it doesn't support
stemming.
Will hopefully be fixed in next release as well.
|
|
ethaniel
Name: Ethaniel Posts: 30 |
to: shodan, 2006-09-05 13:44:45
| reply!
> Will hopefully be fixed in next release as well.
thanks alot :) when are you planning to make the next release?
I'd even love to donate someday - your search is perfect for the time being.
|
|
shodan
Name: Andrew Aksyonoff Posts: 4117 |
to: ethaniel, 2006-09-06 07:06:09
| reply!
> > Will hopefully be fixed in next release as well.
>
> thanks alot :) when are you planning to make the next release?
Somewhere this month.
> your search is perfect for the time being.
Thanks :)
|
|
dweis
Name: Tristan Posts: 31 |
to: shodan, 2006-11-05 11:51:43
| reply!
> > > Will hopefully be fixed in next release as well.
> >
> > thanks alot :) when are you planning to make the next release?
>
> Somewhere this month.
>
> > your search is perfect for the time being.
>
> Thanks :)
I couldn't yet tried : is there some improvement about excerpt with 0.9.7 RC1 ?
|
|
shodan
Name: Andrew Aksyonoff Posts: 4117 |
to: dweis, 2006-11-06 06:58:05
| reply!
> I couldn't yet tried : is there some improvement about excerpt with 0.9.7
I fixed SBCS excerpts after 0.9.7-rc1. The patch is available upon request. :)
|
|
dweis
Name: Tristan Posts: 31 |
to: shodan, 2006-11-06 14:38:12
| reply!
> > I couldn't yet tried : is there some improvement about excerpt with 0.9.7
>
> I fixed SBCS excerpts after 0.9.7-rc1.
Thanks, that's a good news ;)
> The patch is available upon request. :)
I'll wait the 0.9.7 final :)
|
Common forum |
1 | 2 | 3 | 4 | 5 | ... |
263 | 264 | 265 | 266 | next »» | Create new thread
|