A continuing look at putting the Sphinx search engine to use.
Last week we looked at the Sphinx search engine from a high-level point of view. Now let’s look at getting and building the code, some basic setup, and how to build and query indexes.
Get, Build, Install
In order to build Sphinx, you will need the MySQL client libraries (libmysql) and header files (libmysql-dev) as well as the expat library (libexpat) and header files (libexpat-dev) installed in standard locations. Once you have those, you can grab the latest version of the Sphinx trunk from Google Code subversion repository. If you prefer, you can use a tarball from the Sphinx site but new releases are coming only every few months while the code itself evovles more rapidly as bugs are fixed and features added.
svn checkout http://sphinxsearch.googlecode.com/svn/trunk/ sphinxsearch-read-only
cd sphinxsearch-read-only
./configure
make
sudo make install
You’ll end up with several binaries in /usr/local/bin
:
- indexer: reads data from MySQL or an XML import flie and produces the full-text indexes. This can also be used to merge and rotate indexes.
- searchd: the sphinx seach daemon which listens on TCP port 3312 for connections and handles queries.
- search: used for running ad-hoc queries from the command-line directly against the indexes (does not use searchd).
- spelldump: tool for extracting ispell dictionary data
- indextool: tool used for producing information about the indexes, such as the index header, list of document ids, and the “hit list” for a given keyword.
We’ll see how to use several of those tools shortly and next week.
General Sphinx Configuration
Installing Sphinx also deposits a few files in /usr/local/etc
that serve as starting points for configuration. sphinx-min.conf.dist
is an excellent minimal confiugration file to start with. I won’t reproduce it entirely here (you can see the pre-build version from the subversion repository), but it defines a few things to get you started.
There is a source called src1
and a corresponding index called test1
. In Sphinx, you specify various information about an index separately from the data source definition. That helps to separate the index definition from the mechanics of how to get the data. Sphinx has back ends that know how to extract structured data from MySQL, Drizzle (coming soon), ODBC data sources, and arbitrary XML files or command pipelines (known as xmlpipe or xmlpipe2 in recent versions).
There’s an indexer
section that specifies how much memory the indexer should limit itself to. The default is 32MB but you can go as high as 2GB–quite useful if you have very large document sets to index.
Finally, there’s a searchd
section which contains options for the actual search daemon: port number, log file location, timeouts, etc.
Index and Data Configuration
Next we need to tell Sphinx what our data looks like and how it should be indexed. The SQL Data Sources section of the Sphinx manual explains this fairly well if your data lives in MySQL and can easily be queries to get at the documents you’d like to index. However, things are often more complicated. The documents may need some sort of cleanup or pre-index processing before Sphinx should read them. That means telling the Sphinx indexer to use the xmlpipe2 “driver” which reads full XML documents from a pipe. That XML input stream contains a short header that specifies the “schema” for the index too–it does not live in the Sphinx configuratin file as it does for MySQL-based sources.
An example looks like this:
<?xml version="1.0" encoding="utf-8"?>
<sphinx:docset>
<sphinx:schema>
<sphinx:field name="title"/>
<sphinx:field name="body"/>
<sphinx:attr name="size" type="int" bits="32"/>
</sphinx:schema>
<sphinx:document id="1">
<title>First Post</title>
<content>This is not slashdot!</content>
<size>21</size>
</sphinx:document>
<!-- ... more documents ... -->
</sphinx>
That would define an index that has two fields that will be full text indexes (title and body) as well as an attribute (size) that can be used in sorting, ranking, and filtering query results.
Your sphinx.conf
woudl then need a small section like this:
source src1_pipe
{
type = xmlpipe2
xmlpipe_command = perl /home/example/xmlbuilder.pl
}
And you’d change src1
to src1_pipe
to use then. Then when your run the indexer, it’d execute the xmlbuilder.pl
Perl script and read it’s stdout as XML.
While there’s a bit more overhead involved in the initial seutp of an index using the xmlpipe2, it’s a lot more powerful. It allows you to perform arbitrary conversions and cleanup on the data before Sphinx sees it, does not tie you to a particular database table or schema, and so on.
Buliding an Index
Assuming you’ve created a working xmlbuilder.pl
, indexing your documents is very striaghtforward. You simply run indexer
and tell it which index to build (or all indexes).
/usr/local/bin/indexer test1
Or:
/usr/local/bin/indexer --all
The amount of time required to build an index depends on the size of the data, complexity of the index, CPU speed, and several other variables. But indexer
will produce some status output while running by default and it will provide some summary data when it finishes. It’s not unusual to see indexing rates as high as 10,000 documents per second on modern hardware.
Query the Index
Once the index is buillt, you should run a few queries against it to make sure that it is finding documents and nothing unusual is happening. The search
command-line utility is helpful for this. You can search for all documents matching one or more keywords:
/usr/local/bin/search keyword1 keyword2
If you have multiple indexes configured, you can restrict the search to a single index using the -i
command-line argument:
/usr/local/bin/search -i test1 keyword1 keyword2
In either case, search
will produce a list of the documents that contain the search term(s). By default it will perform an and query, meaning that it will only match documents that contain all the terms (if you specify more than one). However, the -a
or --any
option will tell search
to match documents that contain any of the keywords.
In all of those cases, we’re searching all fields in the index: title and body. To specify a single field, you can use some fancier syntax as part of an extended mode match. You simply prefix the search term(s) with @fieldname
like this:
/usr/local/bin/search -i test1 -e @title keyword1
That would find all documents which contain keyword1 in the title.
More to Come…
Now you’ve seen the basics of setting up a simple document index using Sphinx. Next week we’ll look at running the searchd
daemon, running queries against it from PHP and Perl, and dig into some of the advanced features that may be interesting.
Comments on "Sphinx: Getting Practical"
Shop for NBA jerseys at the official NBA Store! We carry the widest variety of cheap nba jerseys, and youth sizes. Keep checking back for the arrivals of the NBA Nike Jersey!
hoverboard repair kit instructions
velociraptor hoverboard xkcd
hoverboard battery on fire
terraria hoverboard not working yosemite
hoverboard black and gold
Shop for NBA jerseys at the official NBA Store! We carry the widest variety of nba jerseys ads, and youth sizes. Keep checking back for the arrivals of the NBA Nike Jersey!
Discount Jordan Shoes , discount , cheap , 30% Off & Up. Discount Every Weekend at KicksJordanShoes.com
I had been extremely pleased to discover this website.
Lasik procedure video
hoverboard laws san diego jobs
hoverboard not charging fix
terraria hoverboard vs fishron wings
hoverboard laws snopes facebook
hoverboard really cheap 3ds
hoverboard in terraria
hoverboard 6 minutes english
hoverboard new law in ca
hoverboard real for sale qld
hoverboard toy r us
hoverboard cost to make
hoverboard 700 watt kasa
hoverboard vibrating while riding
hoverboard replica kit
hoverboard not working properly 90b
hoverboard uk law
hoverboard battery usage
hoverboard repair nashville
hoverboard en 2016
hoverboard for real
hoverboard videos funny talking
hoverboard vinyl wrap 1080
hoverboard replica buy
hoverboard crash montage
hoverboard repair chicago heights
hoverboard repair oakland raiders
hoverboard zapata racing preis
hoverboard at best buy
hoverboard 9gag quotes