Submitting Content

Data is submitted to the Search Appliance with an HTTP POST request sent to a similar URL as the admin interface (e.g. http://.../dowalk), but with /recvdata.xml appended. E.g.:

http://www.example.com/texis/dowalk/recvdata.xml

The following POST variables must be set in the request. Be sure to URL-encode the values:

  • profile Set to the name of the receiving profile.

  • data Set to an XML document containing the data, and what to do with it (insert/delete/etc.). See below for details.

Uploading content

Below is an example data document where all fields are specified. Be sure to HTML-encode values.

<?xml version="1.0" encoding="UTF-8"?>
<ThunderstoneReplication
      xmlns:dt="urn:schemas-microsoft-com:datatypes"
>
  <Item>
    <Type>I</Type>
    <Size>150369</Size>
    <Visited>2005-10-25 15:25:18</Visited>
    <Dlsecs>0</Dlsecs>
    <Depth>0</Depth>
    <Url>http://www.example.com/dir/page.html</Url>
    <Title>Sprocket Specifications</Title>
    <Body>...</Body>
    <Keywords>sprockets, gears, hubs</Keywords>
    <Description>Sprocket details</Description>
    <Meta></Meta>
    <Category>Mechanical</Category>
    <Modified>2005-10-25 11:21:07</Modified>
    <NextCheck>2005-10-25 16:25:18</NextCheck>
    <Views>0</Views>
    <Clicks>0</Clicks>
    <CTR>0.000000</CTR>
    <Pop>0</Pop>
    <MimeType>text/html</MimeType>
    <Charset>UTF-8</Charset>
    <Refs dt:dt="bin.base64">...</Refs>
    <Errors dt:dt="bin.base64">...</Errors>
    <RawData dt:dt="bin.base64"></RawData>
  </Item>
</ThunderstoneReplication>

Any element whose text data might not be XML-safe (e.g. binary chars in the <Body>) should be base64-encoded, and the attribute dt:dt="bin.base64" set in the tag. E.g. the <Refs> and <Errors> elements' text data are always base64-encoded. Note that the XML namespace prefix dt should also then be set to urn:schemas-microsoft-com:datatypes in the root <ThunderstoneReplication> element.

The elements are:

  • <Type> The action to take with this data. Text value may be one of:

    • I - Insert the data (overwrite all previous data for URL, if any)

    • D - Delete the URL

    • DP - Delete the URL as a pattern (e.g. http://www.example.com/dir/*)

    • U - Update the URL, leaving unspecified fields unchanged

    • UI - Update search indexes (call after a batch of inserts/deletes)

  • <Size> The integer size of the original document.

  • <Visited> When the document was fetched, in YYYY-MM-DD HH:MM:SS format.

  • <Dlsecs> Number of seconds taken to download the document.

  • <Depth> Depth of URL from a Base URL, e.g. 0 is a Base URL, 1 is one click away, etc.

  • <Url> The URL of the document.

  • <Title> The title of the document.

  • <Body> The formatted body of the document.

  • <Keywords> Any keywords for the document.

  • <Description> The description of the document.

  • <Meta> Any meta data for the document.

  • <Category> The category the document is in, if any. Must be a category name from the profile's Categories.

  • <Modified> The Last-Modified date of the document in YYYY-MM-DD HH:MM:SS format.

  • <NextCheck> When the document should be refreshed, in YYYY-MM-DD HH:MM:SS format.

  • <Views> Number of views of the document: how many times it has been shown in search results.

  • <Clicks> Number of clicks of the document: how many times it has been clicked on in search results.

  • <CTR> Click-through-ratio: floating-point number ratio of clicks to views.

  • <Pop> Document popularity: number of references (links) to it.

  • <MimeType> The MIME type of the content served at the URL, or provided in RawData.

  • <Charset> Character set of <Body> data. Should correspond with Storage Charset profile setting (here). If a charset other than the Storage Charset is used, it should be a standard IANA charset that the Search Appliance can convert to the Storage Charset.

  • <Refs> Optional element with references (child links) of the document.

  • <Errors> Optional element with errors of the document.

Copyright © Thunderstone Software     Last updated: Nov 8 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.