|
BUY
THIS BOOK FROM BARNES AND NOBLE
Freeman,
Richard, & Lewis, Roger (1998). Planning and
implementing assessment. London: Kogan-Page.
WHAT
IS ASSESSMENT?
The
word "assessment" comes from the Latin “ad sedere”,
meaning “to sit down beside.” Actually, the “sit beside”
language arose less from the friendly sound of mentoring someone
and more from the sense of a legal representative in court
sitting beside a person---500 years or so ago, an assessor
was a person who advised judges on technical points (mostly
having to do with fines and taxes).
Other
Meanings:
-
fix
amount of fine or tax
-
impose
fine or tax
-
estimate
value (e.g., home)
-
estimate
the worth of, judge, or evaluate
Educational
Purposes:
1)
select
2)
certificate
3)
describe
4)
aid learning
5)
improve teaching
These
could be separated into two main dimensions: development
and judgment.
Distinction
In
the UK, assessment is considered to be separate from evaluation.
Assessment focuses on student learning, whereas evaluation
focuses on how the various components of a course (e.g., syllabus,
teacher) perform. Assessment results can be USED for
evaluation, but do not themselves constitute evaluation.
Two
Rules of Thumb:
1)
assess behaviors representative of required performance
2)
use a sufficient sample of behavior
THREE
TYPES OF ASSESSMENT
I.
Norm-Referenced: establishes a rank order of Ss in
terms of achievement; that is, each S is assessed relative
to others in a given group (e.g., year of school). Most
properly used for selection. Performs the precaution that
selected Ss are above a minimum standard of competency.
Problematic
in that it doesn’t measure against a common standard but rather
against a cohort. Therefore, for example, a person who
falls below the cutoff at School A (and is thus not selected)
might fall above the cutoff for selection at another school.
II.
Criterion-Referenced: measure Ss performance in relation
to an explicit, previously determined standard (for example,
a driving exam). Good CRAs first chooses reasonable
standards, makes those standards available publicly,
and then tests according to the standards. They
are problematic to the extent that any of these three things
are not done.
III.
Ipsative (Self-referenced): Ss performance is compared
to their own previous performance rather than objective standards
or the performance of others. Students may also set
their own learning objectives. Problematic if a student
advances relative to his/her own past performance but still
falls short of competency.
Note:
these types are not mutually exclusive---you can use them
in combination.
RELIABILITY
Two
Types: within one instructor’s ratings and among different
instructors.
Ways
to Increase Reliability:
- publish specific
performance criteria, ensure they’re understood by everyone
involved, and adhere to them
- get a bigger sample
of the behavior (e.g., more questions on exams)
- get samples of
a bigger variety of behavior (e.g., assessment portfolios)
- adjust grades (e.g.,
curve by removing poor questions and/or by comparing among
assessors)
- redundancy: have
assignments scored by more than one grader
VALIDITY:
Quote
from H.G. Wells: “The only results we produced were examination
results which merely looked like the real thing. In
the true spirit of an age of individualistic cooperation,
we were selling wooden nutmegs or umbrellas that wouldn’t
open, or brass sovereigns or a patent food without any nourishment
in it.”
Improving
validity:
- explain why you
do what you do in regards to assessment
- assess important
rather than trivial outcomes (even if they’re harder to
measure)
- use appropriate
methods of assessment for a given behavior (even if you
have to devise them!)
- make your assessment
activities interesting to motivate students
- assess what you
actually cover in your classes
OTHER
CRITERIA
In
addition to reliability and validity, consider:
- Authenticity:
was it actually produced by the student?
- Currency:
is the evidence from a recent performance? Often,
we assess once and merely assume the assessment is valid
for all time (as opposed to periodic re-certification).
- Utility:
is the assessment affordable, convenient, and flexible?
We always compromise: e.g., driving tests would be
better if we held them both during the day and a night,
in cars and in trucks, etc.
MODES
OF ASSESSMENT
FORMAL
VS. INFORMAL
- Formal: structured
events (e.g., exams, presentations)
- Informal: casual
without preplanning or preplanned without counting for credit.
FORMATIVE
VS. SUMMATIVE
- Formative: provide
feedback for improving a process.
- Summative: counts
towards a final grade or certification.
FINAL
VS. CONTINUOUS
- Final: taking place
only at the end of a course
- Continuous: taking
place throughout a course
PRODUCT
VS. PROCESS
- Product: focuses
on end results
- Process: focuses
on the manner in which end results are achieved
SOURCES
OF ASSESSMENT DATA
- Students
- Students’ Peers
- Tutors and Graders
- Instructor
|