This Tutorial is for webmaster/programmers. By practicing simple tasks at the command line, you will learn the basics of how to:
To run the tutorial, you will need:
System requirements:
The following would be helpful before you start, but are not required:
The Tutorial shows the process for setting up an index and performing some example queries on a fictitious web site that includes a forum where members discuss video games.
Background Concepts: See Sign-Up, Pricing, and Billing in the FAQ.
Result:
The IndexDen dashboard appears, showing your new account:
Background Concepts: See What is an index? in the FAQ.
Background Concepts: See What languages do you support? in the FAQ.
Result:
If you run the command to view the contents of the directory ( ls on Unix or Mac OS X; dir on Windows), you should see the file indextank_client.py
C:\> python
>>>
>>> import indextank.client as itc
>>> api_client = itc.ApiClient('YOUR_API_URL')Background Concepts: See How do I get my data to you? in the FAQ
>>> test_index = api_client.get_index('test_index')
>>> test_index.add_document('post1', {'text':'I love Bioshock'})
>>> test_index.add_document('post2', {'text':'Need cheats for Bioshock'})
>>> test_index.add_document('post3', {'text':'I love Tetris'})
Here we call the add_document() method in the client library three times to index three posts in the video gamer forum.Result:
test_index now contains:
| Doc ID | Field | Value |
|---|---|---|
| post1 | text | I love Bioshock |
| post2 | text | Need cheats for Bioshock |
| post3 | text | I love Tetris |
Background Concepts: See What types of queries work with IndexDen? in the FAQ
Background Concepts: See the discussion of field names in What is an index? in the FAQ
So far, we have worked with
>>> test_index.search('Bioshock')
{'matches': 2,
'facets': {},
'search_time': '0.070',
'results': [{'docid': 'post2'},{'docid': 'post1'}]}
>>> test_index.search('love Bioshock')
{'matches': 1,
'facets': {},
'search_time': '0.005',
'results': [{'docid': 'post1'}]}
>>> test_index.search('Bioshock OR Tetris')
{'matches': 3,
'facets': {},
'search_time': '0.007',
'results': [{'docid': 'post3'},{'docid': 'post2'},{'docid': 'post1'}]}
>>> test_index.search('love', fetch_fields=['text'])['results']
[{'text': 'I love Tetris', 'docid': 'post3'},
{'text': 'I love Bioshock', 'docid': 'post1'}]
>>> test_index.search('love', snippet_fields=['text'])['results']
[{'snippet_text': 'I love Tetris', 'docid': 'post3'},
{'snippet_text': 'I love Bioshock', 'docid': 'post1'}]
simple document index entries that contain only a single field, text, containing the complete text of the document. Let's redefine the documents now and add some more fields to enable more targeted searching.
>>> test_index.add_document('post1', {'text':'I love Bioshock', 'game':'Bioshock'})
>>> test_index.add_document('post2', {'text':'Need cheats for Bioshock', 'game':'Bioshock'})
>>> test_index.add_document('post3', {'text':'I love Tetris', 'game':'Tetris'})
| Doc ID | Field | Value |
|---|---|---|
| post1 | text | I love Bioshock |
| game | Bioshock | |
| post2 | text | Need cheats for Bioshock |
| game | Bioshock | |
| post3 | text | I love Tetris |
| game | Tetris |
>>> test_index.search('game:Tetris', fetch_fields=['text'])['results'] [{'text': 'I love Tetris', 'docid': 'post3'}]Background Concepts: See What are scoring functions? in the FAQ.
A scoring function is a mathematical formula that you can reference in a query to influence the ranking of search results. Scoring functions are named with integers starting at 0 and going up to 5. Function 0 is the default and will be applied if no other is specified; it starts out with an initial definition of -age, which sorts query results from most recently indexed to least recently indexed (newest to oldest).
Function 0 uses the timestamp field which IndexDen provides for each document. The time is recorded as the number of seconds since epoch. IndexDen automatically sets each document's timestamp to the current time when the document is indexed, but you can override this timestamp. To make this scoring function tutorial easier to follow, that's what we are going to do.
>>> test_index.add_document('newest',{'text': 'New release: Fable III is out','timestamp':1286673129})
>>> test_index.add_document('not_so_new',{'text': 'New release: GTA III just arrived!','timestamp':1003626729})
>>> test_index.add_document('oldest',{'text': 'New release: This new game Tetris is awesome!','timestamp':455332329})
>>> test_index.search('New release')
{'matches': 3,
'facets': {},
'search_time': '0.002',
'results': [{'docid': 'newest'},{'docid': 'not_so_new'},{'docid': 'oldest'}]}
>>> test_index.add_function(0,'age')
>>> test_index.search('New release')
{'matches': 3,
'facets': {},
'search_time': '0.005',
'results': [{'docid': 'oldest'},{'docid': 'not_so_new'},{'docid': 'newest'}]}
>>> test_index.add_document('post4', {'text': 'When is Duke Nukem Forever coming out? I need my Duke.'})
>>> test_index.add_document('post5', {'text': 'Duke Nukem is my favorite game. Duke Nukem rules. Duke Nukem is awesome. Here are my favorite Duke Nukem links.'})
>>> test_index.add_document('post6', {'text': 'People who love Duke Nukem also love our great product!'})
>>> test_index.add_function(1,'relevance')
>>> test_index.search('duke', scoring_function=1, fetch_fields=['text'])['results']
[{'docid': 'post5',
'text': 'Duke Nukem is my favorite game. Duke Nukem rules. Duke Nukem is awesome. Here are my favorite Duke Nukem links.'},
{'docid': 'post4', 'text': 'When is Duke Nukem Forever coming out? I need my Duke.'},
{'docid': 'post6', 'text': 'People who love Duke Nukem also love our great product!'}]
Background Concepts: See What is a scoring function? in the FAQ.
In addition to textual information, each document can have up to three (3) document variables to store any numeric data you would like. Each variable is referred to by number, starting with variable 0. Document variables provide additional useful information to create more subtle and effective scoring functions.
For example, assume that in the video game forum, members can vote for posts that they like. The forum application keeps track of the number of votes. These vote totals can be used to push the more popular posts up higher in search results.
Let's also assume that the forum software assigns a spam score by examining each new post for evidence that it is from a legitimate forum member and contains relevant content, and then assigning a confidence value from 0 (almost certainly spam) to 1 (high confidence that the post is legitimate).
>>> test_index.add_document('post4', {'text': 'When is Duke Nukem Forever coming out? I need my Duke.'}, variables={0:10, 1:1.0})
>>> test_index.add_document('post5', {'text': 'Duke Nukem is my favorite game. Duke Nukem rules. Duke Nukem is awesome. Here are my favorite Duke Nukem links.'}, variables={0:1000, 1:0.9})
>>> test_index.add_document('post6', {'text': 'People who love Duke Nukem also love our great product!'}, variables={0:1, 1:0.05})
>>> test_index.add_function(2, 'relevance * log(doc.var[0]) * doc.var[1]')
>>> test_index.search('duke', scoring_function=2, fetch_fields=['text'])['results']
[{'docid': 'post5', 'text': 'Duke Nukem is my favorite game. Duke Nukem rules. Duke Nukem is awesome. Here are my favorite Duke Nukem links.'},
{'docid': 'post4', 'text': 'When is Duke Nukem Forever coming out? I need my Duke.'},
{'docid': 'post6', 'text': 'People who love Duke Nukem also love our great product!'}]
>>> test_index.update_variables('post4',{0:1000000}) >>> test_index.search('duke', scoring_function=2, fetch_fields=['text'])['results']
[{'docid': 'post4', 'text': 'When is Duke Nukem Forever coming out? I need my Duke.'},
{'docid': 'post5', 'text': 'Duke Nukem is my favorite game. Duke Nukem rules. Duke Nukem is awesome. Here are my favorite Duke Nukem links.'},
{'docid': 'post6', 'text': 'People who love Duke Nukem also love our great prod]
Learn More: Scoring Functions
If you're 100% confident something should not be in the index, it makes sense to remove it.
>>> test_index.delete_document('post6') >>> test_index.search('duke', scoring_function=2, fetch_fields=['text'])['results']
[{'docid': 'post4', 'text': 'When is Duke Nukem Forever coming out? I need my Duke.'},
{'docid': 'post5', 'text': 'Duke Nukem is my favorite game. Duke Nukem rules. Duke Nukem is awesome. Here are my favorite Duke Nukem links.'}]
You can pass variables with a query and use them as input to a scoring function. This is useful, for example, to customize results for a particular user. Suppose we're dealing with the search on the forum site. It makes sense to index the poster's gamerscore to use it as part of the matching process.
>>> test_index.add_document('post1', {'text':'I love Bioshock'}, variables={0:115})
>>> test_index.add_document('post2', {'text':'Need cheats for Bioshock'}, variables={0:2600})
>>> test_index.add_document('post3', {'text':'I love Tetris'}, variables={0:19500})
>>> test_index.add_function(1, 'relevance / max(1, abs(query.var[0] - doc.var[0]))')
>>> test_index.search('bioshock', scoring_function=1, fetch_fields=['text'], variables={0: 25})['results']
[{'docid': 'post1', 'text': 'I love Bioshock.'},
{'docid': 'post2', 'text': 'Need cheats for Bioshock.'}]
>>> test_index.search('love', scoring_function=1, fetch_fields=['text'], variables={0: 15000})['results']
[{'docid': 'post3', 'text': 'I love Tetris.'},
{'docid': 'post1', 'text': 'I love Bioshock.'}]
Now that you have learned some of the basic functionality of IndexDen, you are ready to go more in-depth:
Enjoy using IndexDen to improve the quality of search on your website.
| Documentation | About us | Legal | Social |
|---|---|---|---|
| Client libraries | Email support | Privacy policy | |
| FAQ | Badges | Terms of Service |