Example Datasets¶
The package contains a few static datasets that are intended to serve as toy examples.
Warning
This section may be subject of larger changes. It is possible that in the future the datasets will instead be provided by JuliaML/MLDatasets.jl instead.
Fisher’s Iris data set¶
The Iris data set has become one of the most recognizable machine learning example datasets. It was originally published by Ronald Fisher [FISHER1936] and contains the 4 different kind of measurements (that we call features) for 150 observations of a plant called Iris. The interesting property of the dataset is that it includes these measurements for 3 different species of Iris (50 observations each) and is thus a dataset that is commonly used to showcase classification or clustering algorithms.
-
load_iris
([n]) → Tuple¶ Loads the first
n
observations from the Iris flower data set introduced by Ronald Fisher (1936).Parameters: n (Int) – default 150
. Specifies how many of the total 150 observations should be returned (in their native order).Returns: A tuple of three arrays as the following code snipped shows. The 4 by n
matrixX
contains the numeric measurements, in which each individual column denotes an observation. The vectory
contains the class labels as strings. The optional vectornames
contains the names of the features (i.e. rows ofX
)X, y, names = load_iris(n)
Check out the wikipedia entry for more information about the dataset.
[FISHER1936] | Fisher, Ronald A. “The use of multiple measurements in taxonomic problems.” Annals of eugenics 7.2 (1936): 179-188. |
Noisy Line Example¶
This refers to a static pre-defined toy dataset. In order to
generate a noisy line using some parameters take a look at
noisy_function()
.
-
load_line
() → Tuple¶ Loads an artificial example dataset for a noisy line. It is particularly useful to explain under- and overfitting.
Returns: The vector x
contains 11 equally spaced points between 0 and 1. The vectory
containsx ./ 2 + 1
plus some gaussian noise. The optional vectornames
contains descriptive names forx
andy
.x, y, names = load_line()
Noisy Sin Example¶
This refers to a static pre-defined toy dataset. In order to
generate a noisy sin using some parameters take a look at
noisy_sin()
.
-
load_sin
() → Tuple¶ Loads an artificial example dataset for a noisy sin. It is particularly useful to explain under- and overfitting.
Returns: The vector x
contains equally spaced points between 0 and 2π. The vectory
containssin(x)
plus some gaussian noise. The optional vectornames
contains descriptive names forx
andy
.x, y, names = load_sin()
Noisy Polynome Example¶
This refers to a static pre-defined toy dataset. In order to
generate a noisy polynome using some parameters take a look at
noisy_poly()
.
-
load_poly
() → Tuple¶ Loads an artificial example dataset for a noisy quadratic function.
Returns: It is particularly useful to explain under- and overfitting. The vector x
contains 50 points between 0 and 4. The vectory
contains2.6 * x^2 + .8 * x
plus some gaussian noise. The optional vectornames
contains descriptive names forx
andy
.x, y, names = load_poly()