Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.
https://en.wikipedia.org/wiki/Data_science
Focus on data wrt methods
Does still have sense?
Yes (particularly for vector data)
A data manipulation library
A data visualisation library
Data as tables, catalogs ie vector information
Rows are catalog entries, granules, features (in GIS nomenclature), samples
Columns are observables, attributes (in GIS nomenclature), input parameters, features (in scikit-learn glossary)
Predictions are labels, real values, targets
X = input_data[n_samples,n_attributes]
Y = target_data[n_samples,n_targets]
Estimators (algorithm classes) share common basic methods
fit
: it takes some samples X, targets y if the model is supervised, validate the input data and estimate and store model attributes from the estimated parameters and provided data.predict
: It makes a prediction for each sample, taking X as input (in a classifier
or regressor
transform
: if the estimator is a transformer
transforms the input, usually only X, into a transformed space. pipeline
: pipeline of transforms with a final estimator.