Appendix C — Tasks

The key features of the tasks that we use throughout the book are explained below as well as a plot of the target variable(s).

C.1 Regression Tasks

C.1.1 mtcars

tsk("mtcars")
<TaskRegr:mtcars> (32 x 11): Motor Trends
* Target: mpg
* Properties: -
* Features (10):
  - dbl (10): am, carb, cyl, disp, drat, gear, hp, qsec, vs, wt
tsk("mtcars")$head()
    mpg am carb cyl disp drat gear  hp  qsec vs    wt
1: 21.0  1    4   6  160 3.90    4 110 16.46  0 2.620
2: 21.0  1    4   6  160 3.90    4 110 17.02  0 2.875
3: 22.8  1    1   4  108 3.85    4  93 18.61  1 2.320
4: 21.4  0    1   6  258 3.08    3 110 19.44  1 3.215
5: 18.7  0    2   8  360 3.15    3 175 17.02  0 3.440
6: 18.1  0    1   6  225 2.76    3 105 20.22  1 3.460
autoplot(tsk("mtcars"))

See more at ?mlr_tasks_mtcars.

C.2 Classification Tasks

C.2.1 german_credit

tsk("german_credit")
<TaskClassif:german_credit> (1000 x 21): German Credit
* Target: credit_risk
* Properties: twoclass
* Features (20):
  - fct (14): credit_history, employment_duration, foreign_worker,
    housing, job, other_debtors, other_installment_plans,
    people_liable, personal_status_sex, property, purpose, savings,
    status, telephone
  - int (3): age, amount, duration
  - ord (3): installment_rate, number_credits, present_residence
tsk("german_credit")$head()
   credit_risk age amount                              credit_history duration
1:        good  67   1169     all credits at this bank paid back duly        6
2:         bad  22   5951 no credits taken/all credits paid back duly       48
3:        good  49   2096     all credits at this bank paid back duly       12
4:        good  45   7882 no credits taken/all credits paid back duly       42
5:         bad  53   4870    existing credits paid back duly till now       24
6:        good  35   9055 no credits taken/all credits paid back duly       36
16 variables not shown: [employment_duration, foreign_worker, housing, installment_rate, job, number_credits, other_debtors, other_installment_plans, people_liable, personal_status_sex, ...]
autoplot(tsk("german_credit"))

See more at ?mlr_tasks_german_credit.

C.2.2 penguins

tsk("penguins")
<TaskClassif:penguins> (344 x 8): Palmer Penguins
* Target: species
* Properties: multiclass
* Features (7):
  - int (3): body_mass, flipper_length, year
  - dbl (2): bill_depth, bill_length
  - fct (2): island, sex
tsk("penguins")$head()
   species bill_depth bill_length body_mass flipper_length    island    sex
1:  Adelie       18.7        39.1      3750            181 Torgersen   male
2:  Adelie       17.4        39.5      3800            186 Torgersen female
3:  Adelie       18.0        40.3      3250            195 Torgersen female
4:  Adelie         NA          NA        NA             NA Torgersen   <NA>
5:  Adelie       19.3        36.7      3450            193 Torgersen female
6:  Adelie       20.6        39.3      3650            190 Torgersen   male
1 variable not shown: [year]
autoplot(tsk("penguins"))

See more at ?mlr_tasks_penguins.

C.2.3 penguins_simple

tsk("penguins_simple")
<TaskClassif:penguins> (333 x 11): Simplified Palmer Penguins
* Target: species
* Properties: multiclass
* Features (10):
  - dbl (7): bill_depth, bill_length, island.Biscoe, island.Dream,
    island.Torgersen, sex.female, sex.male
  - int (3): body_mass, flipper_length, year
tsk("penguins_simple")$head()
   species bill_depth bill_length body_mass flipper_length island.Biscoe
1:  Adelie       18.7        39.1      3750            181             0
2:  Adelie       17.4        39.5      3800            186             0
3:  Adelie       18.0        40.3      3250            195             0
4:  Adelie       19.3        36.7      3450            193             0
5:  Adelie       20.6        39.3      3650            190             0
6:  Adelie       17.8        38.9      3625            181             0
5 variables not shown: [island.Dream, island.Torgersen, sex.female, sex.male, year]
autoplot(tsk("penguins_simple"))

See more at ?mlr3data::mlr_tasks_penguins_simple.

C.2.4 sonar

tsk("sonar")
<TaskClassif:sonar> (208 x 61): Sonar: Mines vs. Rocks
* Target: Class
* Properties: twoclass
* Features (60):
  - dbl (60): V1, V10, V11, V12, V13, V14, V15, V16, V17, V18, V19, V2,
    V20, V21, V22, V23, V24, V25, V26, V27, V28, V29, V3, V30, V31,
    V32, V33, V34, V35, V36, V37, V38, V39, V4, V40, V41, V42, V43,
    V44, V45, V46, V47, V48, V49, V5, V50, V51, V52, V53, V54, V55,
    V56, V57, V58, V59, V6, V60, V7, V8, V9
tsk("sonar")$head()
   Class     V1    V10    V11    V12    V13    V14    V15    V16    V17    V18
1:     R 0.0200 0.2111 0.1609 0.1582 0.2238 0.0645 0.0660 0.2273 0.3100 0.2999
2:     R 0.0453 0.2872 0.4918 0.6552 0.6919 0.7797 0.7464 0.9444 1.0000 0.8874
3:     R 0.0262 0.6194 0.6333 0.7060 0.5544 0.5320 0.6479 0.6931 0.6759 0.7551
4:     R 0.0100 0.1264 0.0881 0.1992 0.0184 0.2261 0.1729 0.2131 0.0693 0.2281
5:     R 0.0762 0.4459 0.4152 0.3952 0.4256 0.4135 0.4528 0.5326 0.7306 0.6193
6:     R 0.0286 0.3039 0.2988 0.4250 0.6343 0.8198 1.0000 0.9988 0.9508 0.9025
50 variables not shown: [V19, V2, V20, V21, V22, V23, V24, V25, V26, V27, ...]
autoplot(tsk("sonar"))

See more at ?mlr_tasks_sonar.

C.2.5 spam

tsk("spam")
<TaskClassif:spam> (4601 x 58): HP Spam Detection
* Target: type
* Properties: twoclass
* Features (57):
  - dbl (57): address, addresses, all, business, capitalAve,
    capitalLong, capitalTotal, charDollar, charExclamation, charHash,
    charRoundbracket, charSemicolon, charSquarebracket, conference,
    credit, cs, data, direct, edu, email, font, free, george, hp, hpl,
    internet, lab, labs, mail, make, meeting, money, num000, num1999,
    num3d, num415, num650, num85, num857, order, original, our, over,
    parts, people, pm, project, re, receive, remove, report, table,
    technology, telnet, will, you, your
tsk("spam")$head()
   type address addresses  all business capitalAve capitalLong capitalTotal
1: spam    0.64      0.00 0.64     0.00      3.756          61          278
2: spam    0.28      0.14 0.50     0.07      5.114         101         1028
3: spam    0.00      1.75 0.71     0.06      9.821         485         2259
4: spam    0.00      0.00 0.00     0.00      3.537          40          191
5: spam    0.00      0.00 0.00     0.00      3.537          40          191
6: spam    0.00      0.00 0.00     0.00      3.000          15           54
50 variables not shown: [charDollar, charExclamation, charHash, charRoundbracket, charSemicolon, charSquarebracket, conference, credit, cs, data, ...]
autoplot(tsk("spam"))

See more at ?mlr_tasks_spam.

C.3 Survival Tasks

C.3.1 rats

tsk("rats")
<TaskSurv:rats> (300 x 5): Rats
* Target: time, status
* Properties: -
* Features (3):
  - int (2): litter, rx
  - fct (1): sex
tsk("rats")$head()
   time status litter rx sex
1:  101      0      1  1   f
2:   49      1      1  0   f
3:  104      0      1  0   f
4:   91      0      2  1   m
5:  104      0      2  0   m
6:  102      0      2  0   m
autoplot(tsk("rats"))

See more at ?mlr3proba::mlr_tasks_rats.

C.4 Density Tasks

C.4.1 precip

tsk("precip")
<TaskDens:precip> (70 x 1): Annual Precipitation
* Target: -
* Properties: -
* Features (1):
  - dbl (1): precip
tsk("precip")$head()
   precip
1:   67.0
2:   54.7
3:    7.0
4:   48.5
5:   14.0
6:   17.2
autoplot(tsk("precip"))

See more at ?mlr3proba::mlr_tasks_precip.

C.5 Spatiotemporal Tasks

C.5.1 ecuador

tsk("ecuador")
<TaskClassifST:ecuador> (751 x 11): Ecuador landslides
* Target: slides
* Properties: twoclass
* Features (10):
  - dbl (10): carea, cslope, dem, distdeforest, distroad,
    distslidespast, hcurv, log.carea, slope, vcurv
* Coordinates:
            x       y
  1: 712882.5 9560002
  2: 715232.5 9559582
  3: 715392.5 9560172
  4: 715042.5 9559312
  5: 715382.5 9560142
 ---                 
747: 714472.5 9558482
748: 713142.5 9560992
749: 713322.5 9560562
750: 715392.5 9557932
751: 713802.5 9560862
tsk("ecuador")$head()
   slides       carea   cslope     dem distdeforest distroad distslidespast
1:   TRUE   5577.3916 34.42789 1911.52        15.00      300              9
2:   TRUE   1399.2329 30.71569 2198.66       300.00      300             21
3:   TRUE 351155.1250 32.81444 1988.71       300.00      300             40
4:   TRUE    500.5027 33.90592 2320.49       300.00      300            100
5:   TRUE    671.1807 41.60017 2021.07       300.00      300             21
6:   TRUE    634.3320 30.29457 1838.40         9.15      300              2
4 variables not shown: [hcurv, log.carea, slope, vcurv]
autoplot(tsk("ecuador"))

See more at ?mlr3spatiotempcv::mlr_tasks_ecuador.

C.6 Clustering Tasks

C.6.1 usarrests

tsk("usarrests")
<TaskClust:usarrests> (50 x 4): US Arrests
* Target: -
* Properties: -
* Features (4):
  - int (2): Assault, UrbanPop
  - dbl (2): Murder, Rape
tsk("usarrests")$head()
   Assault Murder Rape UrbanPop
1:     236   13.2 21.2       58
2:     263   10.0 44.5       48
3:     294    8.1 31.0       80
4:     190    8.8 19.5       50
5:     276    9.0 40.6       91
6:     204    7.9 38.7       78
autoplot(tsk("usarrests"))

See more at ?mlr3cluster::mlr_tasks_usarrests.