Matthias Templ

GEOSTAT 2016

Karel Hron and Peter Filzmoser for a long-term and fruitful cooperation

Karel Hron for providing his slides on compositional data analysis (used in some parts of the presentation)

- petroleum geology, hydrogeology, hydrology
- meteorology, oceanography
- geochemistry, geometallurgy
- geography
- forestry, environmental control, landscape ecology
- soil science, agriculture

**petroleum geology**, hydrogeology, hydrology**meteorology**, oceanography**geochemistry**,**geometallurgy**- geography
- forestry, environmental control, landscape ecology
**soil science**, agriculture

Example compositional spatial and temporal data: Proportions of land use/land types or forest fragmentation proportions in each grid cell with potential (covariates may include elevation range, road length, population, median household income, and housing levels).

- What are compositional data?
- Real space vs the simplex, representation in Coordinates
- Examples
- Applications in multivariate statistics using geochemical data
- Why to use
**robust**methods? - The R package
**robCompositions**

- \( D \)-part vectors, describing quantitatively the parts of some whole, which carry exclusively relative information between the parts (Aitchison, 1986; Pawlowsky-Glahn et al., 2015)
- Typical units of measurement: percentages, mg/kg, mg/l
- Examples: geochemical data - proportions of minerals in a rock; concentations of fenolical acids in wine (mg/l); household expenditures on various commodities (foodstuff, housing, clothing), forest fragmentation proportions, etc.
- Compositional data consist of multivariate observations with positive values that sum up to a constant. Examples are proportional data or percentages, for which the values sum up row-wise to 1 or 100.

- One or more variables of multivariate data are not available or has not been measured?
- When rounding errors leads to violate the prescribed constraint?
- Or what happens if the sum is not constant at all, but very different for each compositional observations?

The answer: it (always) depends on the analysis goals

**Absolute information**: refers to the original raw data, in their concrete units such as counts, monetary units, temperature, precipitation, etc.**Relative information**: refers to a relative data representation, like proportions or percentages such as concentration of chemical elements in parts per million (ppm) or mg/kg, share of family income to gross household income, percentage of votes for a political party, daylight per day, etc..

- relative information is analyzed by considering (log-)ratios between the variables.
- representation of data in orthonormal/orthogonal coordinates
- analysis on orthonormal/orthogonal coordinates and backtransformation to the original space

A NO GO:

statistical analysis of compositional data using **standard statistical methods** with the assumption of **Euclidean geometry in real space** is **just wrong** but typically applied in practice.

Absolute and relative concentrations of Phosphor (P) for samples extracted by X-ray fluorescence (XRF) from agricultural soils in Europe.