The Clair library is intended to simplify a number of generic tasks in Natural Language Processing (NLP), Information Retrieval (IR), and Network Analysis. Its architecture also allows for external software to be plugged in with very little effort. Functionality native to Clairlib includes Tokenization, Summarization, LexRank, Biased LexRank, Document Clustering, Document Indexing, PageRank, Biased PageRank, Web Graph Analysis, Network Generation, Power Law Distribution Analysis, Network Analysis (clustering coefficient, degree distribution plotting, average shortest path, diameter, triangles, shortest path matrices, connected components), Cosine Similarity, Random Walks on Graphs, Statistics (distributions, tests), Tf, Idf, Community Finding.
💡 Deep Analysis
Deep Dive into CLAIRLIB Documentation v1.03.
The Clair library is intended to simplify a number of generic tasks in Natural Language Processing (NLP), Information Retrieval (IR), and Network Analysis. Its architecture also allows for external software to be plugged in with very little effort. Functionality native to Clairlib includes Tokenization, Summarization, LexRank, Biased LexRank, Document Clustering, Document Indexing, PageRank, Biased PageRank, Web Graph Analysis, Network Generation, Power Law Distribution Analysis, Network Analysis (clustering coefficient, degree distribution plotting, average shortest path, diameter, triangles, shortest path matrices, connected components), Cosine Similarity, Random Walks on Graphs, Statistics (distributions, tests), Tf, Idf, Community Finding.
📄 Full Content
The University of Michigan CLAIR (Computational Linguistics and Information Retrieval) group is happy to present version 1.03 of the Clair Library.
The Clair library is intended to simplify a number of generic tasks in Natural Language Processing (NLP), Information Retrieval (IR), and Network Analysis (NA). Its architecture also allows for external software to be plugged in with very little effort.
We are distributing the Clair library in two forms: Clairlib-core, which has essential functionality and minimal dependence on external software, and Clairlib-ext, which has extended functionality that may be of interest to a smaller audience. Depending on whether you choose to install only Clairlib-core or both Clairlib-core and Clairlib-ext, some of the content of this manual will not apply to your installation. Throughout this document, for the sake of brevity, we will usually say “the Clair library” or the more abbreviated “Clairlib” to refer to the software we’re distributing.
This work has been supported in part by National Institutes of Health grants R01 LM008106 “Representing and Acquiring Knowledge of Genome Regulation” and U54 DA021519 “National center for integrative bioinformatics,” as well as by grants IDM 0329043 “Probabilistic and link-based Methods for Exploiting Very Large Textual Repositories,” DHB 0527513 “The Dynamics of Political Representation and Political Rhetoric,” 0534323 “Collaborative Research: BlogoCenter -Infrastructure for Collecting, Mining and Accessing Blogs,” and 0527513 “The Dynamics of Political Representation and Political Rhetoric,” from the National Science Foundation.
Much can be done using Clairlib on its own. Some of the things that Clairlib can do are listed below, in separate lists indicating whether that functionality comes from within a particular distribution of Clairlib, or is made available through Clairlib interfaces, but actually is imported from another source, such as a CPAN module, or external software.
This guide explains how to install both Clairlib distributions, Clairlib-Core and Clairlib-Ext. To install Clairlibcore, follow the instructions in the section immediately below. To install Clairlib-Ext, first follow the instructions for installing Clairlib-Core, then follow those for Clairlib-Ext itself. Clairlib-Ext requires an installed version of Clairlib-Core in order to run; it is not a stand-alone distribution.
Clairlib-Core requires Perl 5.8.2 or greater. Before you proceed, confirm that the version of Perl you are running is at least this recent by entering perl -v at the shell prompt.
Download MEAD 3.11 or later from http://www.summarization.com/mead/
. The installation package is in .tar.gz (“tarball”) format. To install MEAD in, say, the directory $HOME/mead, ensure that the installation package is located in $HOME, and enter the following at the shell prompt:
$ cd $HOME $ gunzip .tar $ cd mead $ perl Install.PL
Next, you will need to compile tf2gen.cpp to produce an executable required by MEAD. Enter the following:
$ cd $HOME/mead/bin/feature-scripts $ g++ tf2gen.cpp -o tf2gen
Clairlib-Core depends on access to the following Perl modules: There are multiple approaches to locating and installing these modules; using the automated CPAN installer, which is bundled with Perl, is perhaps the quickest and easiest. To do so, enter the following at the shell prompt:
$ perl -MCPAN -e shell If you have not yet configured the CPAN installer, then you'll have to do so this one time. If you do not know the answer to any of the questions asked, simply hit enter, and the default options will likely suit your environment adequately. However, when asked about parameter options for the perl Makefile.PL command, users without root permissions or who otherwise wish to install Perl libraries within their personal $HOME directory structure should enter the suggested path when prompted:
This will cause the CPAN installer to install all modules it downloads and tests into $HOME/perl, which means that all subdirectories of this directory that contain Perl modules will need to be added to Perl’s @INC variable so that they will be found when needed (see section V below for further explanation).
As a side note, if you ever need to reconfigure the installer, type at the shell prompt:
$ perl -MCPAN -e shell cpan>o conf init
After configuration (if needed), return to the CPAN shell prompt, cpan> and type the following to upgrade the CPAN installer to the latest version:
cpan>install Bundle::CPAN cpan>q If asked whether to prepend the installation of required libraries to the queue, hit return (or enter yes). After quitting the shell, type the following to install or upgrade Module::Build and make it the preferred installer:
$ perl -MCPAN -e shell cpan>install Module::Build cpan>o conf prefer_installer MB cpan>o conf commit cpan>q Finally, install each of the following dependencies (if you are at all uns