Example Datasets¶
The package contains a few static datasets that are intended to serve as toy examples.
Warning
This section may be subject of larger changes. It is possible that in the future the datasets will instead be provided by JuliaML/MLDatasets.jl instead.
Fisher’s Iris data set¶
The Iris data set has become one of the most recognizable machine learning example datasets. It was originally published by Ronald Fisher [FISHER1936] and contains the 4 different kind of measurements (that we call features) for 150 observations of a plant called Iris. The interesting property of the dataset is that it includes these measurements for 3 different species of Iris (50 observations each) and is thus a dataset that is commonly used to showcase classification or clustering algorithms.
-
load_iris([n]) → Tuple¶ Loads the first
nobservations from the Iris flower data set introduced by Ronald Fisher (1936).Parameters: n (Int) – default 150. Specifies how many of the total 150 observations should be returned (in their native order).Returns: A tuple of three arrays as the following code snipped shows. The 4 by nmatrixXcontains the numeric measurements, in which each individual column denotes an observation. The vectorycontains the class labels as strings. The optional vectornamescontains the names of the features (i.e. rows ofX)X, y, names = load_iris(n)
Check out the wikipedia entry for more information about the dataset.
| [FISHER1936] | Fisher, Ronald A. “The use of multiple measurements in taxonomic problems.” Annals of eugenics 7.2 (1936): 179-188. |
Noisy Line Example¶
This refers to a static pre-defined toy dataset. In order to
generate a noisy line using some parameters take a look at
noisy_function().
-
load_line() → Tuple¶ Loads an artificial example dataset for a noisy line. It is particularly useful to explain under- and overfitting.
Returns: The vector xcontains 11 equally spaced points between 0 and 1. The vectorycontainsx ./ 2 + 1plus some gaussian noise. The optional vectornamescontains descriptive names forxandy.x, y, names = load_line()
Noisy Sin Example¶
This refers to a static pre-defined toy dataset. In order to
generate a noisy sin using some parameters take a look at
noisy_sin().
-
load_sin() → Tuple¶ Loads an artificial example dataset for a noisy sin. It is particularly useful to explain under- and overfitting.
Returns: The vector xcontains equally spaced points between 0 and 2π. The vectorycontainssin(x)plus some gaussian noise. The optional vectornamescontains descriptive names forxandy.x, y, names = load_sin()
Noisy Polynome Example¶
This refers to a static pre-defined toy dataset. In order to
generate a noisy polynome using some parameters take a look at
noisy_poly().
-
load_poly() → Tuple¶ Loads an artificial example dataset for a noisy quadratic function.
Returns: It is particularly useful to explain under- and overfitting. The vector xcontains 50 points between 0 and 4. The vectorycontains2.6 * x^2 + .8 * xplus some gaussian noise. The optional vectornamescontains descriptive names forxandy.x, y, names = load_poly()