[debian-devel:12100] Re: Status of search engines


 debian-www@Org list より回覧:

In article <20000414110001.B15590@xxxxxxxxxxxxxxxxxxxxxx>,
  at "Fri, 14 Apr 2000 11:00:01 +1000',
   with "Status of search engines",
 csmall@xxxxxxxxxxxxxxxxxxxxxx (Craig Small) さん writes:

> G'day All,
>   Here is the status of searching for search engines.
> Ferret
> ======
> This was the old verisim search engine that is going/is to be GPLed.  It
> uses a lot of perl and currently needs some file location tidying up
> and general debian package cleaning.
> I don't believe it does file based indexing (as opposed to through a
> webserver), though that may be me not understanding how it works.
> The index file is about 1:1 the size of the archive.
> Udmsearch
> =========
> A new search engine that has a C program for the indexer, uses a
> database and pretty much anything for the retriever.  Does support file
> access and incremental but currently doesn't understand when files have
> changed.
> Currently a lintian clean-ish debian package. The postgresql database is
> about 1:1 the size of the archive.
> id-utils
> ========
> A very simple indexer with no web-based retrivial.  Doesn't (yet) have
> the idea of what html looks like or weights but that is being worked on.
> Very fast indexing and very small index files, but they may grow with
> the features.
> It's biggest drawback is that it doesn't have little summarys of the
> page.
> Namazu
> ======
> I had great difficulties in getting this working for me. It apparently
> does 1:3 index files.
> I think there was another suggestion but cannot find it.
In article <20000413222231.A27536@xxxxxxxxxxxxxxxx>,
  at "Thu, 13 Apr 2000 22:22:31 -0400',
   with "Re: Status of search engines",
 "James A. Treacy" <treacy@debian.org> さん writes:

> On Fri, Apr 14, 2000 at 11:00:01AM +1000, Craig Small wrote:
> > 
> > I think there was another suggestion but cannot find it.
> > 
> There is htdig. It does local indexing, but by parsing the html itself.
> Since it doesn't understand content negotiation it can't follow our link
> structure. :( If it weren't for that, we could use it for the entire
> site. As it is, it could be used for the list archives.
Namazu は使い難い (設定が難しい) うえに index ファイルのサイズが

 Debian Project には英語以外に french とか german とか portoguese とか
ヨーロッパ系の言語が主に使われている ML もあったような気がするのですが
そのへんで i18n 対応とか必要になったりはしませんかね ?

# 詳細は知らないので、議論するなら専門家の方にお任せします。
# 直接 Craig にメールして、what is difficulties have you met ? とか
# 聞いてみては ?

