注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

陈钟鸣的博客

独立之精神,自由之思想.

 
 
 

日志

 
 

Oracle Text Application(自用备忘)用了6个小时,眼睛好累  

2008-05-27 16:45:57|  分类: oracle |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
1 Understanding Oracle Text Application Development
    理解Oracle文本应用设计
    1.1 What is Oracle Text?
        什么是Oracle Text?
        Oracle Text is a technology that enables you to build text query applications and document classification applications. Oracle Text provides indexing, word and theme searching, and viewing capabilities for text.  
        Oracle Text是一种技术.它能让用户创建text查询应用与文档分类应用.Oracle Text提供索引,关键字与标题查找,并且能显示text容量.
        
    1.2 Designing Your Application
        设计你的应用
        To design your Oracle Text application, you must determine the type of queries you expect to execute. Doing so enables you to choose the most suitable index for the task. We can divide application queries into three different categories:
        设计你的Oracle Text应用,你必须决定你想选择哪种查询,你有以下选择:
        *Text Queries on Document Collections
            在文档中查找文本.
        *Queries on Catalog Information
            查询目录信息
        *Document Classification
            文档分类信息
            
    1.3 Text Queries on Document Collections
        在文档中查找文本
        A text query application enables users to search document collections such as Web sites, digital libraries, or document warehouses.
        Searching is enabled by first indexing the document collection.
        The collection is typically static with no significant change in content after the initial indexing run.
        Documents can be of any size and of different formats such as HTML, PDF, or Microsoft Word. These documents are stored in a document table.
        文本查询应用允许用户一样在文档集中查找,例如,Web站点,数字图书馆,文档数据仓库.搜索时,允许用户首先建立文档索引.
        文档可以是任意大小,任何格式的,例如,HTML,PDF,WORD.这些文档存储在文档表中.
        Queries usually consist of words or phrases.
        Application users can specify logical combinations of words and phrases using operators such as OR and AND.
        Other query operations such as stemming, proximity searching, and wildcarding can be used to improve the search results.
        查询,通常是由关键字,句子组成.
        应用程序用户可以使用操作符指定词句之间的逻辑关系,如,OR,AND.
        还有一些操作符,如,stemming,proximity searching,与windcarding用于改善搜索结果.
        An important factor for this type of application is retrieving documents that are relevant to a user query while retrieving as few non-relevant documents as possible.
        The most relevant documents must be ranked high in the result list.
        The queries for this type of application are best served with a CONTEXT index on your document table. To query this index, your application uses the SQL CONTAINS operator in the WHERE clause of a SELECT statement
    
    1.3.1 Flowchart of Text Query Application
        全文检索应用的流程
        A typical text query application on a document collection enables the user to enter a query.
        The application issues a CONTAINS query and returns a list, called a hitlist, of documents that satisfy the query.
        The results are usually ranked by relevance. The application enables the user to view one or more documents in the hitlist.
        For example, an application might index URLs (HTML files) on the World Wide Web and provide query capabilities across the set of indexed URLs.
        Hitlists returned by the query application are composed of URLs that the user can visit.
        
        A query application can be modeled according to the following steps:
            一个查询应用可能分以下几步:
       1.The user enters a query.用户输入一个查询请求.
       2.The application executes a CONTAINS query.应用执行contains查询
       3.The application presents a hitlist.应用返回hitlist
       4.The user selects document from hitlist.用户从hitlist中选择文档.
       5.The application presents a document to the user for viewing.应用传送文档给用户.
   
2 Getting Started with Oracle Text
    开始
    2.1 Creating an Oracle Text User
        Before you can create Oracle Text indexes and use Oracle Text PL/SQL packages, you need to create a user with the CTXAPP role. This role enables you to do the following:
        你必须具有CTXAPP的权限
    *Create and delete Oracle Text indexing preferences
    *Use the Oracle Text PL/SQL packages

        To create an Oracle Text application developer user, do the following as the system administrator user:
        
        step 1:
        create user text identified by text;
        
        step 2:
        grant resource,connect,ctxapp to text;
        
        step 3:
        GRANT EXECUTE ON CTXSYS.CTX_CLS TO text;
        GRANT EXECUTE ON CTXSYS.CTX_DDL TO text;
        GRANT EXECUTE ON CTXSYS.CTX_DOC TO text;
        GRANT EXECUTE ON CTXSYS.CTX_OUTPUT TO text;
        GRANT EXECUTE ON CTXSYS.CTX_QUERY TO text;
        GRANT EXECUTE ON CTXSYS.CTX_REPORT TO text;
        GRANT EXECUTE ON CTXSYS.CTX_THES TO text;
        GRANT EXECUTE ON CTXSYS.CTX_ULEXER TO text;
        
    2.2 Query Application Quick Tour
        Step 1 Connect as the New User
        
        conn text/text@coconet
        
        Step 2 Create your Text Table
        
        create table docs(id number primary key,text clob);
        
        Step 3 Load Documents into Table
        
        INSERT INTO docs VALUES(1, '<HTML>California is a state in the US.</HTML>');
        INSERT INTO docs VALUES(2, '<HTML>Paris is a city in France.</HTML>');
        INSERT INTO docs VALUES(3, '<HTML>France is in Europe.</HTML>');
        commit;
        
        Step 4 Create the CONTEXT index        
        
        CREATE INDEX idx_docs ON docs(text)
     INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS
     ('FILTER CTXSYS.NULL_FILTER SECTION GROUP CTXSYS.HTML_SECTION_GROUP');
        
        
        Step 5 Querying Your Table with CONTAINS
        
        SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'france', 1) > 0;

          SCORE(1)         ID TEXT
        ---------- ---------- ------------------------------------------------------
                 4          2 <HTML>Paris is a city in France.</HTML>
                 4          3 <HTML>France is in Europe.</HTML>
                 
        Step 5 Present the Document
        
            SET SERVEROUTPUT ON;
            DECLARE
              mklob CLOB;
              amt NUMBER := 40;
              line VARCHAR2(80);
            BEGIN
            CTX_DOC.MARKUP('idx_docs','3','France', mklob);
            DBMS_LOB.READ(mklob, amt, 1, line);
            DBMS_OUTPUT.PUT_LINE('FIRST 40 CHARS ARE:'||line);
            DBMS_LOB.FREETEMPORARY(mklob);
            END;
            
            
            FIRST 40 CHARS ARE:<HTML><<<France>>> is in Europe.</HTML>
            
        Step 6 Synchronize the Index After Data Manipulation
        
            数据被处理后,同步索引.
            
            INSERT INTO docs VALUES(4, '<HTML>Los Angeles is a city in California.</HTML>');
            INSERT INTO docs VALUES(5, '<HTML>Mexico City is big.</HTML>');
            
            EXEC CTX_DDL.SYNC_INDEX('idx_docs', '2M');
            
            SELECT SCORE(2), id, text FROM docs WHERE CONTAINS(text, 'city', 2) > 0;
              SCORE(2)         ID TEXT
            ---------- ---------- -----------------------------------------------------
                     4          2 <HTML>Paris is a city in France.</HTML>
                     4          4 <HTML>Los Angeles is a city in California.</HTML>
                     4          5 <HTML>Mexico City is big.</HTML>

        
3  Indexing with Oracle Text    
    索引Oracle Text
    3.1 索引类型:
        context
            索引独立文档,如MS Word, HTML or plain text.
            支持所有的查询服务,支持分区表.
            查询操作符:ONTAINS
        concat
            索引混合文档,例如,基本表中有name,price,descriptions concat类型的索引可以对这些字段进行复合索引.
            支持:
            INDEX SET
            LEXER
            STOPLIST
            STORAGE
            WORDLIST (only prefix_index attribute supported for Japanese data)
            Format, charset, and language columns not supported.
            Table and index partitioning not supported.
            不支持表与索引分区,不支持高亮等
            
            查询操作符:CATSEARCH
            
        ...还有两种类型(CTXRULE,CTXPATH)与本文关系不大,略...
        
    3.2 Structure of the Oracle Text CONTEXT Index
        context索引的结构:
        
        关键字                        出现的文档
        DOG                            doc1,doc2,doc3
        
    3.3 Create a Context Index
        3.3.1 Context index and DML
            A context index is not transcational.When you perform inserts,updates,or deletes on the basetable,
            you must explicitly synchronize the index with CTX_DDL.SYNC_INDEX.
        3.3.2 Default CONTEXT Index Example
        
            create index myindex on docs(text) indextype ctxsys.context;
            
        3.3.3 creating CTXCAT sub-indexes
            create table auction(
                item_id number,
                title varchar2(100),
                category_id number,
                price number,
                bid_close date);
            To create your sub-indexes,create an index set to contain them:
            step 1:
            
                begin
                    ctx_ddl.create_index_set('auction_iset');
                end;
                
            step 2:    
                begin
                    ctx_ddl.add_index('auction_iset','price');
                    ctx_ddl.add_index('auction_iset','price,bid_close');
                end;
            CREATE INDEX auction_titlex ON AUCTION(title,price) INDEXTYPE IS CTXSYS.CTXCAT PARAMETERS ('index set auction_iset');
            
            insert into auction values(1,'camera sony',1,101,sysdate);
            insert into auction values(2,'video sony',1,90,sysdate);
            insert into auction values(3,'TV cangjia',1,100,sysdate);
            
            SELECT * FROM auction WHERE CATSEARCH(title, 'cangjia','price=100 order by bid_close')> 0;
            
                ITEM_ID TITLE                CATEGORY_ID      PRICE BID_CLOS
            ---------- -------------------- ----------- ---------- --------
                     3 TV cangjia                     1        100 08-05-27
                     
    
        3.3.4 索引维护
            1.查看没有被同步的索引
            insert into docs values(6,'i am a cat');
            
            SELECT pnd_index_name, pnd_rowid, to_char(pnd_timestamp, 'dd-mon-yyyy hh24:mi:ss') timestamp FROM ctx_user_pending;
            
            PND_INDEX_NAME                      PND_ROWID          TIMESTAMP
            ----------------------------------- ------------------ ------------------------
            IDX_DOCS                            AAAOdjAAEAAAAdeAAF 27-5月 -2008 13:12:24
            
            
            EXEC CTX_DDL.SYNC_INDEX('idx_docs', '2M');
            
            SELECT pnd_index_name, pnd_rowid, to_char(pnd_timestamp, 'dd-mon-yyyy hh24:mi:ss') timestamp FROM ctx_user_pending;
            
            レコードが選択されませんでした。
            
            2.索引同步
            
                begin
                    ctx_ddl.sync_index('myindex', '2M');
                end;

            3.碎片整理
                CTX_DDL.OPTIMIZE_INDEX
                
4.Querying with Oracle Text
    4.1 contains sql
        select score(1),title from news where contains(text,'oracle',1) > 0;
        select score(1),title
        from news
        where contains(text,'oracle',1)>0
        order by score(1) desc;
        
        declare
          rowno number := 0;
        begin
          for c1 in (SELECT SCORE(1) score, title FROM news
                      WHERE CONTAINS(text, 'oracle', 1) > 0
                      ORDER BY SCORE(1) DESC)
          loop
            rowno := rowno + 1;
            dbms_output.put_line(c1.title||': '||c1.score);
            exit when rowno = 10;
          end loop;
        end;
        /
        
        SELECT SCORE(1), title, issue_date from news
           WHERE CONTAINS(text, 'oracle', 1) > 0
           AND issue_date >= ('01-OCT-97')
           ORDER BY SCORE(1) DESC;
    
   4.2  CATSEARCH
   
       create table BOOKS (ID number, INFO varchar2(200), PUBDATE DATE);
 
        insert into BOOKS values(1,
        '<author>NOAM CHOMSKY</author><subject>CIVILRIGHTS</subject><language>ENGLISH</language><publisher>MITPRESS</publisher>',
        to_date('2003-11-01','yyyy-mm-dd'));
        
        insert into BOOKS values(2,
        '<author>NICANOR PARRA</author><subject>POEMS AND ANTIPOEMS</subject><language>SPANISH</language><publisher>VASQUEZ</publisher>',
        to_date('2001-02-1','yyyy-mm-dd'));
        
        insert into BOOKS values(1,
        '<author>LUC SANTE</author><subject>XMLDATABASE</subject><language>FRENCH</language><publisher>FREEPRESS</publisher>',
          to_date('2002-09-12','yyyy-mm-dd'));
        
        exec ctx_ddl.create_index_set('BOOK_INDEX_SET');
        exec ctx_ddl.add_index('BOOK_INDEX_SET','pubdate');
        exec ctx_ddl.create_section_group('BOOK_SECTION_GROUP','BASIC_SECTION_GROUP');
        exec ctx_ddl.add_field_section('BOOK_SECTION_GROUP','AUTHOR','AUTHOR');
        exec ctx_ddl.add_field_section('BOOK_SECTION_GROUP','SUBJECT','SUBJECT');
        exec ctx_ddl.add_field_section('BOOK_SECTION_GROUP','LANGUAGE','LANGUAGE');
        exec ctx_ddl.add_field_section('BOOK_SECTION_GROUP','PUBLISHER','PUBLISHER');
        
        create index books_index on books(info) indextype is ctxsys.ctxcat
          parameters('index set book_index_set section group book_section_group');
      
      select  id, info from books
        where catsearch(info,
        '<query><textquery grammar="context">NOAM within author and english within language</textquery></query>',
        'order by pubdate')>0;
        
    4.3  Querying with MATCHES
    4.4  CONTAINS Operators
        =,<=,>=,<,>,IN,BETWEEN
        
        Logical AND a b c
        Logical OR    a|b|c
        Logical Not a - b
        hyphen with no space a-b
        ""    "a b c"
        ()    (A B)|C
        
    4.5  Using a Thesaurus in Queries

5 Presenting Documents in Oracle Text
    5.1 Highlighting Query Terms
未完待续....
  评论这张
 
阅读(87)| 评论(0)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017