A collection of datasets containing the grading point of evaluation of 1044 Portuguese students from two core classes(Mathematics and Portuguese).
mat_df
is a data frame with 395 observations.
por_df
is a data frame with 649 observatoins.
full_df
is a combination of mat_df
and por_df
, resulting 1044 observations.
These data frames all have 33 variables:
character vector. Student's school
binary: "GP"
- Gabriel Pereira or "MS"
- Mousinho da Silveira
character vector. Student's sex
binary: "F"
- female or "M"
- male
Interger, student's age, from 15 to 22
Character, student's home address type.
binary: "U"
- urban or "R"
- rural
Character vector, family size
binary: "LE3"
- less or equal to 3 or "GT3"
- greater than 3
Character vector, parent's cohabitation status
binary: "T" - living together or "A" - apart
Integer, mother's education
multinomial:
0 - none,
1 - primary education (4th grade),
2 – 5th to 9th grade,
3 – secondary education,
4 – higher education
Integer, father's education
nominal:
0 - none,
1 - primary education (4th grade),
2 – 5th to 9th grade,
3 – secondary education,
4 – higher education
Character vector, mother's job,
nominal:
"teacher" - teacher,
"health" - health care related,
"services" - civil services (e.g. administrative or police),
"at_home" - at home,
"other" - other
Character vector, father's job,
nominal:
"teacher" - teacher,
"health" - health care related,
"services" - civil services (e.g. administrative or police),
"at_home" - at home,
"other" - other
Character vector, reason to choose this school
nominal:
"home" - close to home,
"reputation" - school reputation,
"course" - course preference,
"other" - other
Character vector, student's guardian
nominal: "mother", "father" or "other"
Integer, home to school travel time
nominal/incremental:
1 - less than 15 min.,
2 - 15 to 30 min.,
3 - 30 min. to 1 hour,
4 - greater than 1 hour
Integer, weekly study time
nominal/incremental:
1 — less than 2 hours,
2 — 2 to 5 hours,
3 — 5 to 10 hours,
4 — greater than 10 hours
Integer, number of past class failures
nominal/incremental: n if 1 <= n < 3, else 4
Character vector, extra educational support
binary: "yes" or "no"
Character vector, family educational support.
binary: "yes" or "no"
Character vector, extra paid classes within the course subject (Math or Portuguese)
binary: "yes" or "no"
Character vector, extra-curricular activities
binary: "yes" or "no"
Character vector, attended nursery school
binary: "yes" or "no"
Character vector, wants to take higher education
binary: "yes" or "no"
Character vector, internet access at home
binary: "yes" or "no"
Character vector, with a romantic relationship
binary: "yes" or "no"
Integer, quality of family relationships
numeric: from 1 - very bad to 5 - excellent
Integer, free time after school
numeric: from 1 - very low to 5 - very high
Integer, going out with friends
numeric: from 1 - very low to 5 - very high
Integer, workday alcohol consumption
numeric: from 1 - very low to 5 - very high
Integer, weekend alcohol consumption
numeric: from 1 - very low to 5 - very high
Integer, current health status
numeric: from 1 - very bad to 5 - very good
Integer, number of school absences
numeric: from 0 to 93
Integer, first period grade
numeric: from 0 to 20
Integer, second period grade
numeric: from 0 to 20
Integer, final grade
numeric: from 0 to 20
P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7.
This paper is available at http://www3.dsi.uminho.pt/pcortez/student.pdf