Research

The initiative is interested in supporting research projects designed to further the foundations of data science. We also aim to build collaborative applied teams of data science researchers to infuse data science across disciplines. One of our ongoing research projects studies rankability, explainable ranking algorithms, and active learning to ranking algorithms. This is describe in more detail below.

Foundations of Rankability

Rankability refers to a dataset’s ability to produce a meaningful ranking of its items. A ranking is a list of items from most to least important. Ranking is a fundamental and underappreciated data science task that permeates almost every aspect of translating computational and algorithmic results into a form that a human can use. Its applications are numerous and include web search, cybersecurity, machine learning, and statistical learning theory.

Minimal attention has been paid to the question of whether a data set is suitable for ranking. Most consumers apply a ranking method without asking questions such as: Can this ranking be trusted? Are parts of the ranking too similar to be disambiguated and possibly meaningless? How can rankability be quantified? Can rankable subgraphs be identified? At what point is a dynamic, time-evolving graph rankable?

RankLib

RankLib is a collection of rankability datasets, test suites, and related resources. The primary purpose is to support researchers working on the rankability problem by centralizing and curating a set of rankability problems with known theoretical, empirical, and partials solutions.

Publications

The Rankability of Data
Paul Anderson, Timothy Chartier, and Amy Langville
SIAM Journal on Mathematics of Data Science 2019 1:1, 121-143
Typos and Corrections

IGARDS Technical Report 1

Software

Python (and other language) implementation of rankability and related algorithms can be found at https://www.github.com/IGARDS/rankability_toolbox.

RankLib scripts, containers, and miscellaneous artifacts used for hosting this library can be found at https://www.github.com/IGARDS/ranklib.