In bestand X vind je dus woord Y met p woorden links en q rechts door in je terminal de opdracht te geven:
$ perl kwic.plx Y X p q
Morgen komt mijn bigram-collocatie-programma!
#! /usr/bin/perl
use warnings;
use strict;
## Author: Barend Beekhuizen
## Exercise 1.7 of Manning and Schütze's book (Statistical NLP). A KWIC-programme that outputs in html. Arguments of the command are (1) search query (word) as a reg.exp. (2) file name of corpus (3) context left in words (4) context right in words.
# This reads the arguments from the command line and assigns them as values to the four variables
my $searchQuery = shift;
my $corpus = shift;
my $contextLeft = shift;
my $contextRight = shift;
# This initiates three arrays; the first two as the windows of the context, the last one the entire corpus
my @contextLList;
my @contextRList;
my @corpusWords;
# This opens the corpus we declared
open CORPUS, "< $corpus" or die $!;
# The first string of html-code: initiating a html-script, describing the query and starting a table
print "<html>\n<body>\n",
"<h4>KWIC for <i>$searchQuery</i> in \"$corpus\" ",
"with contexts of $contextLeft left and $contextRight right</h4>\n",
"<table><table border=\"0\"\ncellspacing=\"10\">";
# preprocessing the corpus: spacing all punctuation marks
while (<CORPUS>)
{
s/\./ ./g; s/\,/ ,/g; s/;/ ;/g; s/\:/ :/g; s/\?/ ?/g; s/\!/ !/g; s/"/ "/g; s/'/ '/g;
$/ = " ";
push @corpusWords, $_
};
# going over the corpus
foreach my $i (1..@corpusWords-1)
{
$/ = " ";
push @contextLList, $corpusWords[$i-1];
if (@contextLList > $contextLeft) {shift @contextLList};
push @contextRList, $corpusWords[$i+$contextRight];
if (@contextRList > $contextRight) {shift @contextRList};
if ($corpusWords[$i] =~ /\b$searchQuery\b/i)
{
my $hitLeft = "@contextLList";
my $hitRight = "@contextRList";
my $hit = $corpusWords[$i];
print "<tr>\n<td align=\"right\">$hitLeft</td>\n<td align = \"center\">$hit</td>\n<td align = \"left\">$hitRight</td>\n</tr>"
};
};
# final string of html
print "</table>\n</body>\n</html>\n";
hmm, B, misschien moet je het script nog een keertje plakken. sommige delen zijn nu niet te lezen.
BeantwoordenVerwijderenRaar...ga er naar kijken
BeantwoordenVerwijderen