<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Projects on Laurence A. F. Park</title><link>https://lapark.github.io/projects/</link><description>Recent content in Projects on Laurence A. F. Park</description><generator>Hugo</generator><language>en-us</language><copyright>This page, its contents and style, are the responsibility of the author and do not necessarily represent the views, policies or opinions of Western Sydney University. The header image Memorial to Folly is unofficial Fan Content permitted under the Fan Content Policy. Not approved/endorsed by Wizards. Portions of the materials used are property of Wizards of the Coast. ©Wizards of the Coast LLC. All other content &amp;copy; Laurence Park</copyright><atom:link href="https://lapark.github.io/projects/index.xml" rel="self" type="application/rss+xml"/><item><title>Approximate clustering of very large scale data</title><link>https://lapark.github.io/projects/covat/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://lapark.github.io/projects/covat/</guid><description>&lt;p&gt;Many machine learning, data mining and statistical analysis tasks are
used to identify properties of data, or to build models of sampled
data. Unfortunately, these analytical methods are computationally
expensive; their computational resource use being a function of the
dimensionality and sample count of the data being analysed. Recent
advances is data acquisition (such as genetic sequencing) have allowed
us to capture very large amounts of data (of the order of terabytes)
for analysis and modelling. The complexity of current machine learning
and data mining methods makes them infeasible to be directly applied
to such large scale data.&lt;/p&gt;</description></item><item><title>Automatic thesaurus construction using non-linear term relationships</title><link>https://lapark.github.io/projects/autothesaurus/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://lapark.github.io/projects/autothesaurus/</guid><description>&lt;p&gt;Text based information retrieval systems retrieve documents based on
the set of key terms provided to them. The documents returned are
ranked according to the count of each query term, therefore if the
query terms do not exist in the document it is not found. Latent
semantic analysis (LSA) is a method of computing hidden topics within
documents using linear algebra. By obtaining the relationships between
each hidden topic and each term, we are able to compute which terms
are similar by comparing the similarity of each of the terms
topics. This hidden topic information allows the retrieval system to
return documents that do not contain the query terms, but do contain
terms that are similar to the query terms (shown in Fig. 5). The
current linear algebraic techniques use the Euclidean distance as a
similarity measure for vectors. Unfortunately, the Euclidean distance
is not a useful metric for term or document similarity.&lt;/p&gt;</description></item><item><title>Discovering document model deficiencies for information retrieval</title><link>https://lapark.github.io/projects/deficiencies/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://lapark.github.io/projects/deficiencies/</guid><description>&lt;p&gt;Text based information retrieval systems are built using document
models. To analyse the retrieval precision of a model, a set of
queries are provided to the model and the results are compared to the
desired results. This type of analysis allows us to compare the
precision of different retrieval models, but it does not provide us
with any feedback on where the models could be improved. Currently
there are no methods of analysis of text retrieval systems that are
able to show where deficiencies lie within the document model. We aim
to investigate methods of retrieval analysis that is able to reveal
where specific document model deficiencies occur, using a given query
and document set. Our analysis will allow information retrieval
experiments to be more thorough and show why certain document models
achieve a certain precision, thus allowing the document models to be
adjusted and improved.&lt;/p&gt;</description></item><item><title>Efficient and effective use of Language Models for Information Retrieval</title><link>https://lapark.github.io/projects/efficientlm/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://lapark.github.io/projects/efficientlm/</guid><description>&lt;p&gt;Language models for text based information retrieval have become a de
facto standard due to their simplicity and effectiveness. Recently,
several language modelling techniques have been developed that assume
a hidden distribution of topics within the set of documents. Such
methods include Probabilistic Latent Semantic Analysis (PLSA) and
Latent Dirichlet Allocation (LDA), where the former uses a multinomial
distribution of topics, while the latter uses a Dirichlet prior. By
using this notion of hidden topics, we are able to compute
relationships from term to topic and hence term to
term. Unfortunately, these language modelling methods produces large
amounts of data and require lengthy periods of time to perform
document retrieval.&lt;/p&gt;</description></item><item><title>Improving the quality of Web information retrieval using multi-resolution link analysis</title><link>https://lapark.github.io/projects/multiresolution/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://lapark.github.io/projects/multiresolution/</guid><description>&lt;p&gt;The World Wide Web is the most important information source in modern
society. The information within the Web is accessed through Web search
engines such as Google and Yahoo. These search engines use a global
popularity rank during the computation of their results, based on
links from all of the pages on the Web. This global popularity rank
used by Google is known as PageRank. Using these global popularity
ranks biases the search results towards the globally popular Web
pages. Our aim is to investigate the effect of a peer popularity rank
where popularity is measured only through sites of a similar
nature. We perform this task by observing the interaction between Web
pages at a finer resolution (as shown in Fig. 1). We expect that using
the peer popularity rank will allow search engines to locate Web pages
that are more specific to the user needs rather than pages of general
popularity. This will improve the quality of information that is
delivered by Web search engines and provide organisation amongst the
billions of existing Web pages.&lt;/p&gt;</description></item><item><title>Relevance-based document models for Information Retrieval</title><link>https://lapark.github.io/projects/relevancemodels/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://lapark.github.io/projects/relevancemodels/</guid><description>&lt;p&gt;Document models are used for information retrieval in order to compute
the probability of a query being related to the document. The majority
of document models are functions of the terms that appear within the
document. This implies that a query is only relevant to a document if
the query terms exist within the document, which is far from the
truth.&lt;/p&gt;
&lt;p&gt;In our project, we have created a new form of document model called, a
Relevance-based document model, which is built based on the relevance
of each query to the document and not the words that appear within the
document. Relevance-based document models are constructed using a set
of queries and the associated relevance of the query to the document
(shown in Fig. 6). This information allows us to construct a model
that provides 100% precision results for the known queries and large
improvements in precision over document content-based models for
partially known queries. Since the relevance-based document models are
based only on the query terms and their relevance to a document, the
time required to build the models is very fast and the storage
required to store the model is very compact.&lt;/p&gt;</description></item><item><title>Web page prefetching using temporal link analysis</title><link>https://lapark.github.io/projects/templinkanalysis/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://lapark.github.io/projects/templinkanalysis/</guid><description>Web searching should be as simple as providing the search engine with
a query and having the search engine return a link to the desired Web
page. Unfortunately, current Web search engines use text based queries
and therefore require the user to provide keywords. Converting the
users information need into a few key words is not a simple
process. Due to this, Web search patterns involve the user visiting
many Web pages that do not satisfy their information need, while
interleaving this process with several visits to the Web search
engine. Rather than the user actively searching for pages using key
words, the search engine could provide pages to the user based on
their Web usage patterns. By examining the users Web history, we will
be able to compute the types of Web pages the user desires and hence
provide search results to the user without the need for key
words. This method of passive Web searching is called
prefetching. Given that Google has had such success in using link
analysis to provide useful retrieval results, we aim to investigate
the utility of Web links to perform prefetching.

To date, we have successfully applied eigenvalue analysis to our
temporal link graph (shown in Fig. 3) to obtain a measure of
importance for each page. We will be examining the effect of using
Hidden Markov models and their benefit over the eigenvalue analysis.

&lt;figure&gt;&lt;img src="../projects/templinkanalysis/image006.jpg"&gt;&lt;figcaption&gt;
 &lt;h4&gt;A temporal link graph showing the time spent on each page (shown in each circle) and the portion of time from users that have followed each of the incoming links (shown on each of the directed edges).&lt;/h4&gt;
 &lt;/figcaption&gt;
&lt;/figure&gt;</description></item></channel></rss>