Free open-source SQL full-text search engine


Main

Community

Commercial services

Misc

 Subscribe in a reader

Tracked by ClickAider

News

Mar 28, 2008 - 0.9.8-rc2 released; and 0.9.9 teaser

0.9.8 is getting closer and closer to release tag, and today we're publishing the next release candidate. The feature freeze finally happened. Since RC1, 0.9.8 branch is only getting bugfixes; this is the list of changes that RC2 introduces:

  • fixed extended2 issues with filtering performance; proximity queries; reject-only queries; quorum queries; certain kinds of duplicate-keyword queries;
  • fixed extended1 phrase queries;
  • fixed extended query parser issues with N-grams and proximity/quorum lengths;
  • fixed highlighting (uses 256 words by default now instead of former 10);
  • fixed index-weights vs. multi-queries;
  • fixed Python API id64 handling and error handling in Query();
  • fixed non-working "exceptions" directive ("synonyms" worked fine);
  • fixed ordinal-related issues (in cases of duplicate document IDs and overlong ordinal values);
  • fixed building on BSD family (now properly including signal.h).

0.9.8-rc2 source and Win32 binaries are available from Downloads page. This time I'm also including Win32 binaries with PostgreSQL support, and Win32 MySQL+SphinxSE binary.

But of course we did not spend three weeks on just bugfixes. Major new features are already being added to 0.9.9 branch. We've implemented 64-bit attributes support and per-query attribute overrides there. The things that are in progress include support of arbitrary expressions for groupby keys, and long-awaited config reload on SIGHUP (that should eliminate the need to restart searchd). Going to publicly release 0.9.9 once those are finalized; but even earlier beta testers are welcome as always.

Also it's about time to finish preparing my talk on Sphinx for the MySQL UC 2008.. not to mention the demo for MySQL Expo. Meet you there!

Update: just noticed class-B bug in extended query parser (OR operator occurring immediately after phrase operator was wrongly treated as a keyword). RC2 sources and binaries on the site are fixed now. Also, the fixed version got nice r1234 revision tag. Sorry for the incovenience to those who already grabbed r1231.

Mar 06, 2008 - Let's call it 0.9.8-rc1

Looks like we're feature complete for 0.9.8, and it's time to announce RC1. There are no more internal requirements to 0.9.8-release. All that's left is testing, fixing, and polishing the documentation. A number of nice contributions were also made during the last month: there's an OpenBSD port and a phpBB plugin now (for the sake of completeness, MovableType plugin was spotted, and a WordPress plugin was announced).

Notable changes since the last snapshot are as follows:

  • added shebang syntax (interpreter execution) to config files;
  • added preopen, preopen_indexes, unlink_old config options;
  • added EscapeString(), BuildKeywords() API calls;
  • added long query support to SphinxSE (query column can now be TEXT or BLOB, default limit is 256 KB);
  • added quorum searching support;
  • added query comments (comments are passed verbatim to query.log);
  • added --rotate support on Windows;
  • added 'field' and 'range-query' source type support for MVAs;
  • added MVA support to xmlpipe2;
  • added full support for all xhtml1 named entities to the stripper;
  • updated SphinxSE, Python API, and Java API to support all recently added features;
  • updated unified documentation;
  • optimized expression evalution, upto 1.5x-2.0x faster now;
  • optimized "frequent|rare" queries in extended2 mode;
  • optimized RAM usage for tiny indexes;
  • now working well on Solaris (believe it or not, unrelated to the recent Sun's acquisitions).

I'd also like to use this occasion to remind that MySQL UC 2008 is due in about 1 month, and I'm going to buy the tickets and finalize the schedule in about a week. So if you were thinking of any possible changes to that schedule of mine (as in: invitations to user group meetings; on-site consulting visits; suggestions to participate in assessment of Bay Area restaurants; etc) then please do email me now.

Jan 28, 2008 - 0.9.8 dev news: more features, and MySQL UC

Thanks to everyone who sponsored the development, this update delivers quite a few nice new features again.

  • added sorting by arbitrary expressions in run time (eg. "@weight+log(price)*2.5"; currently supports abs, ceil, floor, sin, cos, ln, log2, log10, exp, sqrt, min, max, pow, if);
  • added group-by on MVA support, and SetArrayResult() to PHP API;
  • added ordinal sorting with fixed RAM requirements;
  • added field sets syntax (ie. "@(field1,field2) hello @!(field3,field4) world"), and @@relaxed option to query language;
  • added libxml2 support (experimental; must be manually enabled in source for now);
  • added ignore_chars option (to fully ignore in-word characters such as soft hyphenation);
  • added wordcount ranker (SPH_RANK_WORDCOUNT);
  • added workaround for full 32-bit values vs. php 5.2.2+ vs. x64 platform;
  • added iconv support to xmlpipe2.

As usual, there was a number of fixes as well. Two most important ones are that wordforms now reuse charset_table and other index settings (they did not); and that Sphinx compiles on BSD-style systems out of the box again (hotfix for r1065 was available on the forum). It also looks for expat in /usr/local/ now, so xmlpipe2 support is likely to appear on one's BSD box, too.

I also started unifying and updating the documentation towards the release tag. There's no more semi-official documentation embedded in PHP API source; there's “API reference” section in the official HTML one instead. Volunteer English editors out there, anyone?

And now for something completely different, this year we'll be giving a talk on Sphinx at MySQL UC 2008. I'll also be showcasing Sphinx in DotOrg pavilion again, and the request for Sphinx BoF has been filed, too. UC visitors, you now know what and where to attend.

Our participation in MySQL UC also makes a good occasion for on-site visits to the US locations: being at the same continent simplifies the process a bit, you know. If you're interested in an on-site consulting job, please let me know in advance, before the flight tickets are bought (changing the dates on the flights from Russia to USA and back is, well, not totally impossible but definitely.. challenging).

Jan 14, 2008 - 0.9.8 dev news: feature flood

Despite (or maybe thanks to) the holidays the change rate since the last update was pretty high. In addition to a number of bugfixes, there was a plethora of new features and improvements. Major new features include:

  • added “xmlpipe2” source type that lets you wrap arbitrary fields and attributes in a new XML format which indexer understands (“xmlpipe” was limited to only two fixed fields and two fixed attributes);
  • added “wordforms” feature that checks tokenized words against word forms dictionaries, and a new utility called “spelldump” that creates such dictionaries from ispell .dict+.aff files.
  • added ranking modes to ext2 querying engine: you now can choose faster BM25-only or no-ranking modes on per-query basis;
  • added max_iops, max_iosize settings to indexer that enable you to throttle its disk I/O.

Other changes at a glance:

  • added SetFieldWeights() API call to bind field weights by names;
  • added SetMaxQueryTime() API call to limit query execution time;
  • added 64-bit document IDs support everywhere in the API (namely, to SetIDRange(), to UpdateAttributes(), and to result set on PHP instances with 32-bit integers);
  • added per-index HTML stripping settings (html_strip, html_index_attrs, html_remove_elements), and made BuildExcerpts() utilize these settings;
  • added metaphone support;
  • added log file reopen on SIGUSR1 to searchd (for log rotation), and removed log file locking;
  • added --servicename, --iostats switches to searchd;
  • added config file validation, and backslash escaping for number sign.

Proceed to Downloads for a fresh copy of source and Win32 binaries. Ah, and we also have Sphinx Wiki now so if there are any other features I forgot to mention, that would be the right place to help document them!

Dec 12, 2007 - 0.9.8 dev news: lots of bugfixes, and more minor features

Initially I was going to call this entry “maintenance release”, because most of the recent work focused on bugfixing. And bugfixing will continue. We're already stable in core functionality, and deployed in production – but the ultimate goal is to produce rock-solid 0.9.8-release, totally free of major bugs (and then patch it if any issues are uncovered later).

There also was a number of new minor features developed, though, so it's technically not just all about the maintenance. Changes since previous snapshot are as follows:

  • added automated tests;
  • added RPM spec file;
  • added mysql_connect_flags option to MySQL data source;
  • added weight_order, use_boundaries, exact_phrase, single_passage options to BuildExcerpts() API call;
  • added phrase_boundary, phrase_boundary_step options to indexes;
  • rewritten synonyms code (much faster, and suported in SBCS tokenizers now);
  • and, of course, numerous bugfixes.

One important fix-related change which should be mentioned is float attributes support in the APIs. In searchd network protocol, byte order for floats had been changed from machine dependent to network order. This means that Java API works OK with floats now, but older snapshots of 0.9.8 PHP API must be upgraded for floats to work correctly. Only floats (including @geodist) are affected; so if you're not using them yet, you could safely ignore this.

Current revision is r985. Source tarball and Win32 binaries are, as usual, available from Downloads.

Nov 15, 2007 - 0.9.8 dev news: new querying engine, Java API, and commercial services

With a month passed since the last update, we are exactly two times off the initially planned “at-most-biweekly” update schedule. So we are rolling out exactly two major updates to alleviate that.

First, we now have official native Java API. It's still work in progress; and JavaDoc still could be improved by copying everything from reference PHP API verbatim – but most features are implemented, and the interface seems stabilized. Supported JDKs include 1.4 and above; testing was performed with 1.4.2 and 1.5.0; suggestions are welcome.

Second, we implemented a new querying engine (codenamed “extended engine V2”) which is going to gradually replace all the currently existing matching modes. At the moment, it is fully identical to extended mode in functionality, but is much less CPU intensive for some queries. I have already seen improvements of up to 3-5 times in extreme cases. The only currently known case when it's slower is processing complex extended queries with tens to thousands keywords; but forthcoming optimizations will fix that.

V2 engine is currently in alpha state and does not affect any other matching mode yet. Temporary SPH_MATCH_EXTENDED2 mode was added to provide a way to test it easily. We are in the middle of extensive internal testing process (under simulated production load, and then actual production load) right now. Your independent testing results would be appreciated, too!

Current snapshot, dubbed r909, is available from Downloads section (in both source code and Win32 binary bundle forms).

Finally, in addition to technology improvements, there are notable organizational changes as well.

Sphinx Technologies Inc (a company which develops and supports open-source Sphinx full-text search engine) in association with Percona Inc (a consulting company specializing in MySQL/LAMP architecture and performance) is now officially offering commercial Sphinx services. Sphinx is frequently used either with MySQL, or as a complement to MySQL; and this way we are able to offer full service package if you need assistance with both Sphinx and MySQL.

Quite a month.

Oct 14, 2007 - 0.9.8 dev news: several new features, and many bugfixes

This time, a couple of weeks following Highload was mostly dedicated to bugfixing. Most prominently, SphinxSE should now compile and work fine with most MySQL versions (and I personally tested quite a few).

However it seems we simply can not stop adding new features. So this update also includes:

  • full scan support;
  • per-field prefix/infix indexing support;
  • an option to enable/disable star syntax on per-index basis;
  • an option to specify per-index weights and sum weights for matches coming from different indexes (instead of simply using the match from last specified index).

Full scan, or in other words the long awaited “empty query” feature, means that Sphinx will now process and return all document IDs if there's no query string specified. This is useful in case you do not want to perform any full text search, but rather Sphinx-side filtering or grouping only (which in some specific cases turns out to be more efficient that MySQL-side).

The interesting thing to know about full scan is that there is a simple internal optimization which could throw away blocks of records early. It works best when long ranges of documents (sorted by ID) only have a few different attribute values. So if you're filtering on timestamps which grow along with document IDs, this is going to be especially fast with full scan.

As usual, the new snapshot is available from Downloads section.

Sep 19, 2007 - 0.9.8 grouping hotfix

Just a quick update that r820 snapshot which fixes an issue with non-working groupby is available. Older snapshots are removed as well.

Sep 19, 2007 - 0.9.8 dev news: server moved, geodistance feature added

Last week was especially hectic, with too much time being spent on emergent maintenance tasks. It all started with yet another round of server connectivity outages, which, combined with totally incompetent support, forced us to pull the plug and move everything to another hosting company. Then, as if according to Russian proverb that says “two relocations are equal to one fire”, the move went with its bumps too: most notably, it turned out that Plesk database migration tool drops auto-increment option on all table columns in all databases. Adding insult to injury, my desktop box had been hit by a worm which could not be fully cleaned by anti-virus software and I ended up reinstalling the system and software from scratch (system recovery did not work either).

The outcome is however nice and sweet. The new box is more powerful, and we also expect better (more stable) connectivity. At least there were no outages at all yet – while as of time of this writing the old box is down again. The new box is in Europe so it also works somewhat faster for me over SSH; hopefully the difference for US Web visitors will be negligible. And, on the sly, Sphinx source repository had been at last converted from CVS to SVN; one more small step towards making it public at some point in future.

The last but not least of the things falling in “maintenance” category; I've also created Feedburner-powered Sphinx RSS feed which syndicates this news page. Write it down in your moleskine.

But the update would not be complete without a new snapshot. And it comes with a nice new feature again. Enter geodistance.

Sphinx now supports float attributes, and is able to compute geographical distance between two points specified by latitude and longitude pairs (in radians). So you now can specify per-query “anchor point” (and attribute names to fetch per-entry latitude and longitude from), and then use “@geodist” virtual attribute both in the filters and in the sorting clause. In this case distance (in meters) from anchor point to each match will be computed, used for filtering and/or sorting, and returned as a virtual attribute too.

There also had been several bugfixes, most notably in SphinxSE which should now at least compile with every other MySQL version from 5.0.22 to 5.1.21 (I tested quite a few). Almost feature freeze time... almost.

Sep 07, 2007 - 0.9.8 dev news: multi-query support

My talk on Sphinx had been accepted for Highload '2007 Russian conference. So over last couple of weeks I was suddenly busy not only with the usual development cycle but writing the talk as well (it had to be submitted early to get to print materials). The result looks like a nice article so I'm going to translate and post it here later; somewhere after the conference (very end of September, or early October).

Anyway, the work on Sphinx continued. This time I'm rolling out only one new feature, but it is pretty major one: multi-query support. Updated snapshot is available from Downloads section.

The most obvious benefit from using multi-query (ie. sending multiple search queries to Sphinx at once) is saving network connection overheads and other round-trip costs. But what's much more important, it unlocks possibilities to optimize “related” queries internally.

One typical Sphinx usage pattern is to return several different “views” on the search results. For instance, one might need to display per-category match counts along with product search results, or maybe a graph of matches over time. Yes, that could be easily done earlier using the grouping features. However, one had to run the same query multiple times, but with different settings.

From now on, if you submit such queries through newly added multi-query interface (as a side note, ye good olde Query() interface is not going anywhere, and compatibility with older clients should also be in place), Sphinx notices that the full-text search query is the same and it is just sorting/grouping settings which are different. In this case it only performs expensive full-text search once, but builds several different (differently sorted and/or grouped) result sets from retrieved matches. I've seen speedups of 1.5-2 times on my simple synthetic queries; depending on different factors, the speedup could be even greater in practice.

Aug 18, 2007 - Site updates again, 0.9.8 dev news, and a public snapshot

The last week was pretty busy again, being heavily packed with a number of urgent debugging and bugfixing sessions. Nevertheless, the changes to 0.9.8 do not only include bugfixes but a number of new features as well. Since the last update, the following features were implemented:

  • added an option to install searchd as a Windows service;
  • added an option to return all MVA values back to client through the API (another protocol update);
  • updated index merge to support changes in index format (notably, 64-bit IDs and MVAs).
Also there's a number of features already implemented in 0.9.8 that I forgot to mention last week, namely:
  • added support for bitfield attributes (reduces RAM usage);
  • added ranged query throttling;
  • added an option to mlock() precached data (to prevent it from being swapped out);
  • added seamless rotation option (earlier, queries could stall for a few seconds while new data was precached).
There was moderate interest in 0.9.8 snapshots so this time I am also publishing a new snapshot in Downloads section. It must be considered alpha quality at this point, but on the other hand there already were reports of succesful production use. The snapshots are going to be updated from time to time; notices will appear both here and in the mailing list, so if you did not yet subscribe for some reason, it's a good time to.

As you could notice website just had another major update too:

It's possible to view the issues anonymously but you will need to self-register in the tracker in order to be able to report and/or monitor bugs (ie. receive e-mail notifications whenever the bug is updated). Available bug statuses and severities are explained here in Mantis doc; and here are the instructions on reporting Sphinx bugs. So everyone's encouraged to grab a copy of 0.9.8-cvs; give it a try; report any bugs using the tracker... and full sail to 0.9.8-release!

Aug 09, 2007 - Site updates, and 0.9.8 teaser

Finally convinced myself to postpone everything else and spend some time on updating the website. There's a mailing list now, and “Powered by” page is finally updated with a number of installations which better show off Sphinx capabilities.

With the advent of said mailing list, there now is a plan to post development updates here somewhat more regularly, ie. weekly or bi-weekly.

To start those updates, I'll shortly overview new features which are already implemented in Sphinx 0.9.8 (version currently in development). They include

  • improved support for attribute updates;
  • added support for 64-bit document and word IDs (so 4 billion document limit is now removed);
  • added support for libstemmer which brings French, Spanish, Portuguese, Italian, German, Dutch, Swedish, Norwegian, Danish, Finnish, Hungarian and Turkish stemming support to Sphinx in addition to built-in English and Russian stemming;
  • added support for multiple-valued attributes (ie. you can now attach multiple integers, such as tag IDs, to each document);
  • added support for synonyms/exceptions; it is now possible to map several different keyword combinations to one normal form (eg. map both “MS Windows” and “Microsoft Windows” to “Windows”) and also to index exceptions with special chars (such as “OS/2” or “C++”).

Even though 0.9.8 is not yet ready to be tagged “release” because of missing docs for the new features and some not-yet-implemented bits, it still seems stable enough already to be at least tried for production use. As usual, everyone interested is welcome to request a fresh snapshot. (If there's a lot of interest I think I'll be forced to tag it RC1 and release “as is”).

Jul 11, 2007 - Off on vacation

I'm leaving for one-week vacation next evening and am not sure if the hotel will provide Internet access; so please be patient if my replies are delayed.

May 21, 2007 - Terabyte barrier broken

It just came to my attention that BoardReader.com (forums search engine) indexes over 1200 GB of text data in 700,000,000 documents with Sphinx. Hats off to the team!

Apr 26, 2007 - Potential e-mail issues

I recently installed a greylisting filter to fight spam; but have just noticed that there was at least one email from legitimate mail sever which was never retried and never delivered. I confirmed with its sender and its seems that there never was a delivery error report on his side, too. So if you wrote to me and I did not reply in a timely manner, you might be hitting this (mail server configuration) issue as well. In that case, please contact me through Contacts page; messages sent through that page are guaranteed to pass, and your email will get auto-whitelisted on my reply. Sorry for incovenience.

Apr 19, 2007 - Off to UC

My train to Moscow leaves in 1 hour. I'll fly to LA tomorrow, spend a weekend there and fly to MySQL UC then; so please be patient with the emails. Thanks again.

Apr 15, 2007 - To all correspondents

A quick note for everyone who tried to contact me over last week - it was really bad flu; I'm back online now (effective today); I will try to answer all of my email and forum posts over next 1 or 2 days; so if you don't get a reply then please do resend your message. Thanks!

Apr 02, 2007 - Sphinx 0.9.7 released

Sphinx 0.9.7 is now available in Downloads section.

The wait is over.

Most interesting new features added since RC2 include:

  • separate groups sorting clause in group-by mode;
  • SphinxSE with full 0.9.7 features support (practically rewritten);
  • support for 1-grams, prefix and infix indexing;
  • somewhat improved documentation.

There's of course much more to this release, but the complete change log is too large to be posted here, so you'll probably have to download the tarball again.

Mar 25, 2007 - MySQL UC 2007

I can finally confirm that I will be showcasing Sphinx at MySQL Conference and Expo 2007 in DotOrg Pavilion this year again. We will also try to arrange BOF session on full text search in databases in general and Sphinx in particular; the updates will be posted here (and in conference schedule as well, hopefully). Everyone's invited!

Dec 15, 2006 - Sphinx 0.9.7-RC2 released

Sphinx 0.9.7 RC2 is now available in Downloads section.

Major new features include:

  • extended query mode with boolean, field limits, phrases, and proximity support (eg.: @title "hello world"~10 | @body example program);
  • extended sorting mode (eg.: @weight DESC @id ASC);
  • combined phrase+statistical ranking which takes words frequencies into account (currently in extended mode only),
  • official Python API;
  • contributed Perl and Ruby APIs.

Most of those had already been tested in production by some early adopters - thanks for all the support guys!

Also the SE has been updated to support MySQL 5.0.27 and 5.1.14 out of the box.

Seems that it's a little unfair to call this RC2; but as long as documentation and MySQL SE are still not up-to-date (work on SE is in progress right now), something deep inside prevents me from dubbing it “release”.

Am I reinventing Google Beta or something...

Oct 26, 2006 - Sphinx 0.9.7-RC1 released

Sphinx 0.9.7 RC1 is now available in Downloads section.

My initial plans to release 0.9.7 in September slipped due to the most trivial reason. There was a number of major bugs uncovered during the beta testing; so most of October was dedicated to fixing those. Anyway, it seems pretty stable now, so it's time for RC1.

As long as its RC1, some things are still missing; most notably, documentation and MySQL SE are not yet upated with all the latest changes.

However, major changes alone (not to mention all the numerous fixes and improvements) should outweigh that documentation and SE lag. Those are, namely:

  • multiple per-document attributes support;
  • match grouping and per-group counts support;
  • search time optimizations (times faster than 0.9.6 in some cases).

As always, I'm looking forward to your feedback, so it's about time to get yourself a copy and try it!

Sep 06, 2006 - Project news

Despite that the news page is not getting updated frequently, the work on 0.9.7 goes on. Most of this time was dedicated implementing support for multiple per-document attributes instead of pre-0.9.7 hard-coded group and timestamp couple. However, there also was a number of different search time optimizations. Combined with an option to store per-document info externally, which improves indexing time, this should make 0.9.7 the fastest version so far.

Besides that, some R&D is also being done in the area of implementing query language and improving relevance ranking. These are far from final at this point, and might not even make it into next release, but preliminary results seem promising.

As you probably already noticed, the site also just had a minor update, with most notable change being the PayPal donations button. Many thanks to the guy who helped me with that one, you know who you are.

I plan to release 0.9.7 later this month, so stay tuned.

Jul 24, 2006 - Sphinx 0.9.6 released

Sphinx 0.9.6 is now available in Downloads section. It's mostly a (minor) bugfix release, but a couple of useful features had been added as well.

Jun 27, 2006 - Updated storage engine documentation

Updated Sphinx storage engine documentation is now available online. Please do report any bugs you notice there and, as usual, don't hesitate to contact me if you have any questions!

Jun 26, 2006 - Sphinx 0.9.6-RC1 released

Sphinx 0.9.6 RC1 is now available in Downloads section. It's dubbed RC1, because there still are some new features I'd like to add, but I decided that the amount of new features which is already there deserves to see the light. Most interesting features would be:

  • distributed searching support;
  • boolean queries;
  • and last but not least, MySQL storage engine!

Apr 22, 2006 - MySQL UC 2006

I am pleased to announce that I will be speaking at FULLTEXT topics BOF session at MySQL Users Conference 2006. I know that it's pretty late to invite everyone there, but would like to do so anyway! :)

Apr 13, 2006 - Project news

This place is also going to hold intra-release news on Sphinx development; a kind of a blog. To start, let's tease everyone with the upcoming 0.9.6 release. It's feature list might not be that big, but there was a lot of behind the scenes work since 0.9.5.

searchd network protocol was rewritten from scratch. It is now binary, and designed to be extensible and backwards compatibile. Intended to be helpful for those who run a lot of sites on one Sphinx, because updating Sphinx should no longer be an immediate requirement to update sphinxapi.php on each site.

0.9.6 also adds distributed searching capabilities (which entailed the network protocol rewrite). searchd can now not only search local indexes but query remote agents as well, properly merging all the results. It can actually query itself too, which means that this feature can be used both for making a search cluster with 2+ different machines and parallelizing search on 1 multi-CPU box.

There's also an ongoing work on document excerpt generation (with, of course, search terms highlighting) which I want to be completed in 0.9.6. It's implemented as an additional feature in searchd accessible through sphinxapi.

As for the bugfixes, the most noticeable fix is for that annoying bug in the config parser, which replicated the last value if the value was empty. Example config was also fixed, so the provided example should be working smoothly out of the box.

Apr 13, 2006 - Site opened

Publicly available Sphinx site is finally online now. Please update your bookmarks.


Copyright © Andrew Aksyonoff, 2001-2007