[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debian-devel:11905] Re: New Search Engine?



Hi.

In article <20000316175118.K22023@xxxxxxxxxx>,
  at Thu, 16 Mar 2000 17:51:18 -0800,
    on Re: New Search Engine?,
 "Darren O. Benham" <gecko@debian.org> writes:

> Baring problems from -admin, I'd suggest running it on master.. but don't
> link it into the web pages...
> 
> Here's two thoughts:
> 
> 1) Do the indices get created incrementally or are the indices get
> recreated each run?
> 
> 2) What are the size of the indices going to be?

There was a thread about using namazu (namazu2) for the search of 
Web pages on this (debian-www) list (task-split to several machines
can be done also).

# namazu are debian-packaged and maintained by kitame@debian.org

namazu can do incremental indexing, and index creation can be done
at other machines from the web server.

NOKUBI (knok@debian.or.jp) wrote:

  | At first, re-indexing is not heavier than first time indexing. It is
  | "difference indexing". Untouched files are not targets of proccessing.
  | 
  | Second, namazu/namazu2 can handle multiple index files. So some index
  | processing can divide (and could be use some machines).

Requird Time to create initial index for namazu:

BTS (www.jp.debian.org/Bugs/)
Size (bytes):        139,887,588
Total Documents:     16,748
Total Keywords:      1,352,486
Time (sec):          12,566
File/Sec:            1.33

debian-devel  (Debian Project, www.jp.debian.org/Lists-Archives/debian-devel-nnnn)
Size (bytes):        281,988,180
Total Documents:     60,399
Total Keywords:      579,385
Time (sec):          16,163
File/Sec:            3.74

debian-user   (Debian Project, www.jp.debian.org/Lists-Archives/debian-user-nnnn)
Size (bytes):        363,076,366
Total Documents:     89,959
Total Keywords:      743,521
Time (sec):          27,283
File/Sec:            3.30

debian-users-jp (Debian JP Project, www.debian.or.jp/Lists-Archives/debian-users/)
Size (bytes):      90,384,962
Total Documents:        20,800
Total Keywords:    413,239
Time (sec):       5,805
File/Sec:         3.58

debian-devel-jp (Debian JP Project, www.debian.or.jp/Lists-Archives/debian-devel/)
Size (bytes):     54,418,491
Total Documents:  11,642
Total Keywords:   328,629
Time (sec):       3,062
File/Sec:         3.80

KITAME (kitame@debian.org) wrote:

 | I do re-indexing at every 04:00 JST. Please check
 | 
 |  http://sakura.debian.or.jp/~kitame/mknmz.logs/
 |  http://sakura.debian.or.jp/~kitame/mknmz.logs/summary/
 |
 | These files are updated by every re-indexing

Currently, our server (sakura.debian.or.jp) have some hardware trouble,
but I hope we get new machine several week later, and then we can 
provide the required index for search on Debian's Web pages, I think.

-- 
  Taketoshi Sano: <sano@debian.org>,<sano@debian.or.jp>,<kgh12351@xxxxxxxxxxx>