[DICT.TW] DICT.TW ½u¤W¦r¨å

DICT@FreeBSD ¦r¨å¸ê®Æ®w½s͵{§Ç


Chienwen, DICT.TW

2006.02.07
update:2007.04.04

    ¦w¸Ë«e·Ç³Æ:

  1. §A¥²¶·¼ô±x¦p¦ó¨Ï¥Î¹q¸£¡A¥]¬A¦p¦ó¶}¾÷»PÃö¾÷¡C

  2. §A¤v¸g¦w¸Ë¦n FreeBSD¡A¨Ã¥B ports tree ¤w§ó·s§¹²¦¡C

  3. §A¤v¸g¦w¸Ë¦n Apache http server¡A¨Ã¥i°õ¦æ cgi µ{¦¡¡C

  4. §A¤v¸g¦w¸Ë¦n DICT server & client¡C (°Ñ¦Ò: DICT@FreeBSD ¬[³]µ{§Ç)

  5. §A¥²¶·¼ô±x PERL µ{¦¡»y¨¥¡A¥H³B²z¤å¥ó®æ¦¡Âà´«¡C (©Î¨ä¥L¥\¯à¬Û¦Pªºµ{¦¡»y¨¥)


  6. ¦w¸Ëµ{¦¡:

  7. ¥Ñ ports ¦w¸Ë dictfmt ¦r¨å¸ê®Æ®w½s͵{¦¡:

    # cd /usr/ports/textproc/dictfmt ; make install


  8. ¤å¥ó®æ¦¡·§©À:

  9. dictfmt ¹w³]ªº¤å¥ó®æ¦¡¡A¦³¤U¦C¤CºØ:
    FORMATTING OPTIONS

    -c5 FILE is formatted with headwords preceded by 5 or more underscore characters (_) and a blank line. All text until the next headword is considered the definition. Any leading `@' characters are stripped out, but the file is otherwise unchanged. This option was written to format the CIA WORLD FACTBOOK 1995.
     
    -t -c5, --without-info and --without-headword options are implied. Use this option, if an input database comes from dictunformat utility.
     
    -e FILE is in html format, with the headword tagged as bold. (<B>headword - </B>)
    This option was written to format EASTON'S 1897 BIBLE DICTIONARY. A typical entry from Easton is:

    <A NAME="T0000005">
    <B>Abagtha - </B>
    one of the seven eunuchs in Ahasuerus's court (Esther 1:10; 2:21).

    This is converted to:
    Abagtha
       one of the seven eunuchs in Ahasuerus's court (Esther 1:10; 2:21).

    The heading "<A NAME="T0000005"> is omitted, and the headword `Abagtha' is indexed.

    NOTE: This option should be used with caution. It removes several html tags (enough to format Easton properly), but not all. The Makefile that was originally written to format dict-easton uses sed scripts to modify certain cross reference tags. It may be necessary to pipe the input file through a sed script, or hack the source of dictfmt in order to properly format other html databases.
     
    -f FILE is formatted with the headwords starting in column 0, with the definition indented at least one space (or tab character) on subsequent lines. The third line starting in column 0 is taken as the first headword , and the first two lines starting in column 0 are treated as part of the 00-database-info header. This option was written to format the F.O.L.D.O.C.
     
    -h FILE is formatted with the headwords starting in column 0, followed by a comma, with the definition continuing on the same line. All text before the first single character line is included in 00-database-info header, and lines with only one character are omitted from the .dict file. The first headword is on the line following the first single character line. The headword is indexed; the text of the file is not changed. This option was written to format HITCHCOCK'S BIBLE NAMES DICTIONARY.
     
    -j FILE is formatted with headwords starting in col 0, enclosed in colons, followed by the definition. The colons surrounding the headword are removed, and the headword is indexed. Lines beginning with '*', '=', or '-' are also removed. All text before the first headword is included in the headers. This option was written to format the JARGON FILE.

    NOTE: Some recent versions of the JARGON FILE had three blanks inserted before the first colon at each headword. These must be removed before processing with dictfmt. (sed scripts have been used for this purpose. ed, awk, or perl scripts are also possible.)
     
    -p FILE is formatted with `%h' in column 0, followed by a blank, followed by the headword, optionally followed by a line containing `%d' in column 0. The definition starts on the following line. The first line beginning '%h' and any lines beginning '%d' are stripped from the .dict file, and '%h ' is stripped from in front of the headword. All text before the first headword is included in the headers. The second line beginning '%h' is taken as the first headword. This option was written to format Jay Kominek's elements database.
     
    Ãö©ó dictfmt §ó¦hªº»¡©ú¡A½Ð¬d¸ß man dictfmt¡C
    ­Y­ì©lªº¸ê®ÆÀɮ׮榡¤£¦P¡A«h¥Î PERL ¥[¥HÂà´«¡C


  10. PERL µ{¦¡½d¨Ò:

  11. ¥H¤Ñ¥D±Ð­^º~³S¬ÃÃã¨å¬°¨Ò¡A§Ú­Ì¥i±N http ®æ¦¡ªº¸ê®Æ¡A¨Ï¥Î PERL Âà´«¬° dictfmt (-p) ®æ¦¡¡C

  12. µ{¦¡¥Øªº¡G

    ±N http ®æ¦¡:
    <p class="style9"><span class="style11">AAS </span>¡G±Ð§Ê¤½³ø¡F©v®y¤½³ø¡C¥þ¦W¬O Acta Apostolicae Sedis ¡C </p>
    Âà´«¬° dictfmt ®æ¦¡:
    %h AAS
    %d
    <b>AAS</b>
    ±Ð§Ê¤½³ø; ©v®y¤½³ø¡C ¥þ¦W¬O Acta Apostolicae Sedis ¡C

  13. µ{¦¡½X: (ÀɮצWºÙ ./catholic.pl )

    #!/usr/bin/perl
    # 2006.10.14

    use strict ;

    &journal() ;

    ####

    sub journal {

        my @all_option = qw // ;
        my @file_array = qw /a b c d e f g h i j k l m n o p q r s t u v w xyz/ ;

        foreach my $file (@file_array) {
            open ( FILE, "download/$file.htm") or die "¶}±ÒÀÉ®×¥¢±Ñ: $!" ;
            my @FileData = <FILE> ;
            close (FILE) ;
            push @all_option, @FileData ;
            } ;

        open ( TXT, ">catholic.txt" ) or die "¶}±ÒÀÉ®×¥¢±Ñ: $!" ;

    print TXT <<__END ;
    %h 00-database-info
    %d
    ¤Ñ¥D±Ð­^º~³S¬ÃÃã¨å - ¥Ñ¥D®{·|ùڼݤë¥ZªÀ¥Xª©¡A 2001 ¦~¤¸¥¹¡C
    ¨Ó·½: http://stteresa.catholic.org.hk/website/catechumenate/dictionary/
    __END

        foreach my $data (@all_option) {

            if ( $data =~ /.*?\<p class=\".*?\"\>\<span class=\".*?\"\>(.*?)\<\/span\>\s*¡G(.*?)\s*\<\/p\>.*/ )
            {
            my $head = $1 ;
            my $def = $2 ;

            $def =~ s/¡F/\; /g ;
            $def =~ s/¡G/\: /g ;
            $def =~ s/¡A/\, /g ;
            $def =~ s/¡]/ \(/g ;
            $def =~ s/\xA1\x5E/\) /g ; # ¡^ A1 5E
            $def =~ s/¡H/\? /g ;
            $def =~ s/¡I/\! /g ;
            $def =~ s/¡C/¡C /g ;

            $def = "<b>" . $head . "<\/b>\n" . $def ;

            print TXT "%h $head\n%d\n$def\n" ;
            }
        }
        close (TXT);
    }


  14. ½sĶ¸ê®Æ:

  15. ½s¿è .sh ÀÉ: (ÀɮצWºÙ ./make.sh )

    #!/bin/sh
    # 2006.10.14

    perl catholic.pl

    dictfmt --locale zh_TW.Big5 --allchars -p -u http://jesus.tw \
        --columns 80 --without-headword \
        -s "Catholic DICT" \
        catholic < catholic.txt

    dictzip catholic.dict

  16. ¤U¸ü¤Ñ¥D±Ð­^º~³S¬ÃÃã¨åºô­¶ ( A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, XYZ ) ¡A¨ì ./download ¸ê®Æ§¨¡C

  17. °õ¦æ:

    # sh make.sh


  18. ±¾¸ü¸ê®Æ®w:

  19. °õ¦æ:

    # cp catholic.dict.dz /usr/local/lib/dict/
    # cp catholic.index /usr/local/lib/dict/

  20. ­×§ï /usr/local/etc/dictd.conf¡A¥[¤J³o¨Ç³]©w:

    database catholic  { data "/usr/local/lib/dict/catholic.dict.dz"
                         index "/usr/local/lib/dict/catholic.index" }

  21. ­«·s±Ò°Ê dictd:

    # /usr/local/etc/rc.d/dictd.sh restart



DICT.TW
½u¤W¦r¨å