7.9 Caching frequently accessed results


7.9 Caching frequently accessed results

As we have been discussing reducing the work done on user interaction is an important way to reduce the load on the system, and to allow the system to scale better. One way of achieving that is to cache frequently accessed results. If you are doing the same queries time and time again with the same arguments you have a candidate for caching. Some examples of cachable pages are tables of contents, category pages, and most popular items.

How frequently you need to update the cache will depend on the application, as does the method used to fill the cache, and purge it. If you have a relatively static application, such as our Open Directory project, where you might update the database weekly, you can build all the category pages during the data load, since they will be constant for the week. This is especially true if the pages would require a lot of effort to build, which the Open Directory ones do.

In other cases you may want to wait until a page is generated to cache it, and possibly maintain it for a more limited time. An example would be if you had a product detail page, similar to an Amazon page, that was generated from a product table, which was editable, a vendor table, which would be relatively static, and user comments, that were dynamic. You might find that a large number of products are rarely looked at, whereas the most popular ones are looked at most often. If the product is changed you can delete the cached page, and allow it to automatically regenerate, and you might also delete entries that are older than some limit. The script below illustrates a simple caching proxy written in vortex.

<SCRIPT LANGUAGE=vortex>
<A NAME=main>
<$rqp=($SCRIPT_NAME+$PATH_INFO+ '?' +$QUERY_STRING)>
<SQL MAX=1 "select Html from cache where Path=$rqp">
	<send $Html>
	<exit>
</SQL>
<$machines=www1 www2 www3>
<randpick $machines>
<$hrq=( 'http://' + $ret+$rqp)>
<fetch $hrq><$Html=$ret>
<send $Html>
<SQL NOVARS "insert into cache (id, Html, Path)
	values(counter, $Html, $rqp)"></SQL>
</A>
</SCRIPT>

The script doing periodic deletes can use the id field to target old entries. E.g. delete from cache where id < '-10 minutes';to remove entries more than 10 minutes old.

In some cases you may not want to cache entire pages, but just parts of pages, based on particular variables. This can be done easily in Texis with the use of <capture>. It allows you to collect what would be output into a Vortex variable. That can then be sent and stored, similar to the results of <fetch>above.

Back: Minimizing the work done on user interaction

Next: Seek optimization