Sorting data and making subsets
In this section and onward we will use a People dataset instead of generating numbers. The dataset consists of measurements of 12 variables done for 32 persons (16 males and 16 females, half from Scandinavia, half from Mediterranean region). The variables include height, weight, shoe size, income, beer and wine consumption and so on. For details, please, see K. Esbensen. Multivariate data analysis in practice. Camo, 2005.
The data can be sorted using one or several columns:
load people
show(people)
People:
People dataset
Variables
Height Weight Hairleng Shoesize Age Income Beer Wine Sex Swim Region IQ
------- ------- --------- --------- ---- --------- ----- ----- ---- ----- ------- ----
Lars 198 92 -1 48 48 4.5e+04 420 115 -1 98 -1 100
Peter 184 84 -1 44 33 3.3e+04 350 102 -1 92 -1 130
Rasmus 183 83 -1 44 37 3.4e+04 320 98 -1 91 -1 127
Lene 166 47 -1 36 32 2.8e+04 270 78 1 75 -1 112
Mette 170 60 1 38 23 2e+04 312 99 1 81 -1 110
Gitte 172 64 1 39 24 2.2e+04 308 91 1 82 -1 102
Jens 182 80 -1 42 35 3e+04 398 65 -1 85 -1 140
Erik 180 80 -1 43 36 3e+04 388 63 -1 84 -1 129
Lotte 169 51 1 36 24 2.3e+04 250 89 1 78 -1 98
Heidi 168 52 1 37 27 2.35e+04 260 86 1 78 -1 100
Kaj 183 81 -1 42 37 3.5e+04 345 45 -1 90 -1 105
Gerda 157 47 1 36 32 3.2e+04 235 92 1 70 -1 127
Anne 164 50 1 38 41 3.4e+04 255 134 1 76 -1 101
Britta 162 49 1 37 40 3.4e+04 265 124 1 75 -1 108
Magnus 180 82 -1 44 43 3.7e+04 355 82 -1 88 -1 109
Casper 180 81 -1 44 46 4.2e+04 362 90 -1 86 -1 113
Luka 185 82 -1 45 26 1.6e+04 295 180 -1 92 1 109
Federico 187 84 -1 46 27 1.65e+04 299 178 -1 95 1 119
Dona 168 50 1 37 49 3.4e+04 170 162 1 76 1 135
Fabrizia 166 49 1 36 21 1.4e+04 150 245 1 75 1 123
Lisa 158 46 1 34 30 1.8e+04 120 120 1 70 1 119
Benito 177 65 -1 41 26 1.8e+04 209 160 -1 86 1 120
Franko 180 72 -1 43 33 1.9e+04 236 175 -1 85 1 115
Alessandro 181 75 -1 43 42 3.1e+04 198 161 -1 83 1 105
Leonora 163 50 1 36 18 1.1e+04 143 136 1 75 1 102
Giuliana 162 50 1 36 20 1.15e+04 133 146 1 74 1 132
Giovanni 176 68 -1 42 50 3.6e+04 195 177 -1 82 1 96
Leonardo 175 67 1 42 55 3.8e+04 185 187 -1 80 1 105
Marta 165 51 1 36 36 2.6e+04 121 129 1 76 1 126
Rosetta 161 48 1 35 41 3.15e+04 116 196 1 75 1 120
Romeo 178 75 -1 42 30 2.4e+04 203 208 -1 81 1 118
Romina 160 48 1 35 40 3.1e+04 118 198 1 74 1 129
people.sort('Sex')
show(people)
People:
People dataset
Variables
Height Weight Hairleng Shoesize Age Income Beer Wine Sex Swim Region IQ
------- ------- --------- --------- ---- --------- ----- ----- ---- ----- ------- ----
Lars 198 92 -1 48 48 4.5e+04 420 115 -1 98 -1 100
Peter 184 84 -1 44 33 3.3e+04 350 102 -1 92 -1 130
Rasmus 183 83 -1 44 37 3.4e+04 320 98 -1 91 -1 127
Magnus 182 80 -1 42 35 3e+04 398 65 -1 85 -1 140
Casper 180 80 -1 43 36 3e+04 388 63 -1 84 -1 129
Benito 183 81 -1 42 37 3.5e+04 345 45 -1 90 -1 105
Leonardo 180 82 -1 44 43 3.7e+04 355 82 -1 88 -1 109
Romeo 180 81 -1 44 46 4.2e+04 362 90 -1 86 -1 113
Lene 185 82 -1 45 26 1.6e+04 295 180 -1 92 1 109
Mette 187 84 -1 46 27 1.65e+04 299 178 -1 95 1 119
Gerda 177 65 -1 41 26 1.8e+04 209 160 -1 86 1 120
Anne 180 72 -1 43 33 1.9e+04 236 175 -1 85 1 115
Britta 181 75 -1 43 42 3.1e+04 198 161 -1 83 1 105
Lisa 176 68 -1 42 50 3.6e+04 195 177 -1 82 1 96
Leonora 175 67 1 42 55 3.8e+04 185 187 -1 80 1 105
Rosetta 178 75 -1 42 30 2.4e+04 203 208 -1 81 1 118
Jens 166 47 -1 36 32 2.8e+04 270 78 1 75 -1 112
Erik 170 60 1 38 23 2e+04 312 99 1 81 -1 110
Kaj 172 64 1 39 24 2.2e+04 308 91 1 82 -1 102
Luka 169 51 1 36 24 2.3e+04 250 89 1 78 -1 98
Federico 168 52 1 37 27 2.35e+04 260 86 1 78 -1 100
Franko 157 47 1 36 32 3.2e+04 235 92 1 70 -1 127
Alessandro 164 50 1 38 41 3.4e+04 255 134 1 76 -1 101
Giovanni 162 49 1 37 40 3.4e+04 265 124 1 75 -1 108
Gitte 168 50 1 37 49 3.4e+04 170 162 1 76 1 135
Lotte 166 49 1 36 21 1.4e+04 150 245 1 75 1 123
Heidi 158 46 1 34 30 1.8e+04 120 120 1 70 1 119
Dona 163 50 1 36 18 1.1e+04 143 136 1 75 1 102
Fabrizia 162 50 1 36 20 1.15e+04 133 146 1 74 1 132
Giuliana 165 51 1 36 36 2.6e+04 121 129 1 76 1 126
Marta 161 48 1 35 41 3.15e+04 116 196 1 75 1 120
Romina 160 48 1 35 40 3.1e+04 118 198 1 74 1 129
people.sort({'Sex', 'Region'}, 'descend')
show(people)
People:
People dataset
Variables
Height Weight Hairleng Shoesize Age Income Beer Wine Sex Swim Region IQ
------- ------- --------- --------- ---- --------- ----- ----- ---- ----- ------- ----
Lars 168 50 1 37 49 3.4e+04 170 162 1 76 1 135
Peter 166 49 1 36 21 1.4e+04 150 245 1 75 1 123
Rasmus 158 46 1 34 30 1.8e+04 120 120 1 70 1 119
Jens 163 50 1 36 18 1.1e+04 143 136 1 75 1 102
Erik 162 50 1 36 20 1.15e+04 133 146 1 74 1 132
Kaj 165 51 1 36 36 2.6e+04 121 129 1 76 1 126
Magnus 161 48 1 35 41 3.15e+04 116 196 1 75 1 120
Casper 160 48 1 35 40 3.1e+04 118 198 1 74 1 129
Luka 166 47 -1 36 32 2.8e+04 270 78 1 75 -1 112
Federico 170 60 1 38 23 2e+04 312 99 1 81 -1 110
Benito 172 64 1 39 24 2.2e+04 308 91 1 82 -1 102
Franko 169 51 1 36 24 2.3e+04 250 89 1 78 -1 98
Alessandro 168 52 1 37 27 2.35e+04 260 86 1 78 -1 100
Giovanni 157 47 1 36 32 3.2e+04 235 92 1 70 -1 127
Leonardo 164 50 1 38 41 3.4e+04 255 134 1 76 -1 101
Romeo 162 49 1 37 40 3.4e+04 265 124 1 75 -1 108
Lene 185 82 -1 45 26 1.6e+04 295 180 -1 92 1 109
Mette 187 84 -1 46 27 1.65e+04 299 178 -1 95 1 119
Gitte 177 65 -1 41 26 1.8e+04 209 160 -1 86 1 120
Lotte 180 72 -1 43 33 1.9e+04 236 175 -1 85 1 115
Heidi 181 75 -1 43 42 3.1e+04 198 161 -1 83 1 105
Gerda 176 68 -1 42 50 3.6e+04 195 177 -1 82 1 96
Anne 175 67 1 42 55 3.8e+04 185 187 -1 80 1 105
Britta 178 75 -1 42 30 2.4e+04 203 208 -1 81 1 118
Dona 198 92 -1 48 48 4.5e+04 420 115 -1 98 -1 100
Fabrizia 184 84 -1 44 33 3.3e+04 350 102 -1 92 -1 130
Lisa 183 83 -1 44 37 3.4e+04 320 98 -1 91 -1 127
Leonora 182 80 -1 42 35 3e+04 398 65 -1 85 -1 140
Giuliana 180 80 -1 43 36 3e+04 388 63 -1 84 -1 129
Marta 183 81 -1 42 37 3.5e+04 345 45 -1 90 -1 105
Rosetta 180 82 -1 44 43 3.7e+04 355 82 -1 88 -1 109
Romina 180 81 -1 44 46 4.2e+04 362 90 -1 86 -1 113
Making subsets is as easy as with matrices, just specify indices for rows and columns:
show(people([1, 2], :))
People:
People dataset
Variables
Height Weight Hairleng Shoesize Age Income Beer Wine Sex Swim Region IQ
------- ------- --------- --------- ---- -------- ----- ----- ---- ----- ------- ----
Lars 198 92 -1 48 48 4.5e+04 420 115 -1 98 -1 100
Peter 184 84 -1 44 33 3.3e+04 350 102 -1 92 -1 130
show(people(1:3, 1:2:end-2))
People:
People dataset
Variables
Height Hairleng Age Beer Sex
------- --------- ---- ----- ----
Lars 198 -1 48 420 -1
Peter 184 -1 33 350 -1
Rasmus 183 -1 37 320 -1
In addition to that, mdadata
objects can be also subset by using column and row names, either one or several, combined to a cell array.
show(people(1:5, 'Height'))
People:
People dataset
Height
-------
Lars 198
Peter 184
Rasmus 183
Lene 166
Mette 170
show(people('Lars', :))
People:
People dataset
Variables
Height Weight Hairleng Shoesize Age Income Beer Wine Sex Swim Region IQ
------- ------- --------- --------- ---- -------- ----- ----- ---- ----- ------- ----
Lars 198 92 -1 48 48 4.5e+04 420 115 -1 98 -1 100
show(people({'Lars', 'Lene'}, {'Height', 'Weight'}))
People:
People dataset
Variables
Height Weight
------- -------
Lars 198 92
Lene 166 47
Names can also be used to specify a sequence of columns or rows:
show(people(1:3, 'Height:Income'))
People:
People dataset
Variables
Height Weight Hairleng Shoesize Age Income
------- ------- --------- --------- ---- --------
Lars 198 92 -1 48 48 4.5e+04
Peter 184 84 -1 44 33 3.3e+04
Rasmus 183 83 -1 44 37 3.4e+04
show(people('Lars:Lene', :))
People:
People dataset
Variables
Height Weight Hairleng Shoesize Age Income Beer Wine Sex Swim Region IQ
------- ------- --------- --------- ---- -------- ----- ----- ---- ----- ------- ----
Lars 198 92 -1 48 48 4.5e+04 420 115 -1 98 -1 100
Peter 184 84 -1 44 33 3.3e+04 350 102 -1 92 -1 130
Rasmus 183 83 -1 44 37 3.4e+04 320 98 -1 91 -1 127
Lene 166 47 -1 36 32 2.8e+04 270 78 1 75 -1 112
show(people('Lars:Lene', 'Height:Income'))
People:
People dataset
Variables
Height Weight Hairleng Shoesize Age Income
------- ------- --------- --------- ---- --------
Lars 198 92 -1 48 48 4.5e+04
Peter 184 84 -1 44 33 3.3e+04
Rasmus 183 83 -1 44 37 3.4e+04
Lene 166 47 -1 36 32 2.8e+04
Be sure that you use correct column names, not the ones that were specified for printing or plotting. Check data.colNames
if you have doubts.
d = mdadata([180 85; 165 55], {}, {'Height (cm)', 'Body mass (kg)'});
disp('Column names are:')
disp(d.colNames);
% correct name
show(d(:, 'Bodymass'))
% wrong name
try
show(d(:, 'Body mass (kg)'))
catch e
disp(e.message)
end
Column names are:
'Height' 'Bodymass'
Body mass (kg)
---------------
85
55
Wrong values for column indices!
Last but not least, one can use logical expressions with columns to make subsets. In this case the logical expression must be written as a text string and be used instead of row indices. In the expression you can use column names or numbers. Here are some examples:
show(people('Sex == 1', 1:4));
People:
People dataset
Variables
Height Weight Hairleng Shoesize
------- ------- --------- ---------
Lene 166 47 -1 36
Mette 170 60 1 38
Gitte 172 64 1 39
Lotte 169 51 1 36
Heidi 168 52 1 37
Gerda 157 47 1 36
Anne 164 50 1 38
Britta 162 49 1 37
Dona 168 50 1 37
Fabrizia 166 49 1 36
Lisa 158 46 1 34
Leonora 163 50 1 36
Giuliana 162 50 1 36
Marta 165 51 1 36
Rosetta 161 48 1 35
Romina 160 48 1 35
show(people('Sex == 1 & Weight > 60', 1:4));
People:
People dataset
Variables
Height Weight Hairleng Shoesize
------- ------- --------- ---------
Gitte 172 64 1 39
If you want to use column numbers, they should be specified with leading dollar sign: '$1'
, '$15'
and so on.
show(people('$1 < 180 & $3 == -1', 1:4));
People:
People dataset
Variables
Height Weight Hairleng Shoesize
------- ------- --------- ---------
Lene 166 47 -1 36
Benito 177 65 -1 41
Giovanni 176 68 -1 42
Romeo 178 75 -1 42
Function find()
can be used to get indices of rows, which meet logical conditions.
i = find(people, 'Sex == 1 & Region == -1');
disp(people.rowNames(i)');
Columns 1 through 7
'Lene' 'Mette' 'Gitte' 'Lotte' 'Heidi' 'Gerda' 'Anne'
Column 8
'Britta'