TEA in Istanbul -Day 2-

Assessing productive skills

The second day started with Sue Hackett’s presentation on assessing speaking and writing.

As a warmer, Sue wants pairs to complete the following sentences…

The CEFR is …..

The role of self assessment is….
Validity is …
Test specifications are important because…

Sue then wanted some pairs to shout out the ones the sentences that they came up with. Some of them are as follows:

The CEFR is neutral and objective.
The CEFR standardize a global framework
The role of self assessment is….to help the learner see his her progress
The role of self assessment is important for goal setting
Validity is the key
Validity is central to the whole process the first thing and the last thing you think of
Test specifications are important because…they act as a road map to the preparation of the test
Test specifications are important because they set standards for test preparation

Sue started the session by underlying the fact that ‘Speaking’ component is covered in the CEFR very well as opposed to ‘Writing’ component.

Sue stated that although both of the skills are productive skills and the students use the same linguistic resources, learners use different mental processes. For example, textual features change, sociocultural norms and patterns of language use change as well as cognitive processes involved in production and interaction.

Another thing a tester has to always keep in mind is the importance of knowing the learners and their linguistic and social background while planing and preparing tests.

Following this, Sue asked participants what other aspects can be mentioned about testing writing and speaking. Some of the comments are as follows;

-The issue of affect and error correction both get involved in writing and speaking in different ways.
-Self correction can be postponed in writing but it has to be immediate in speaking
-Lack of time for planning in speaking is another aspect that needs to be kept in mind
– It is easier to give feedback for writing but for speaking it is not as common
– Fluency is more important in speaking, and accuracy is more of a factor in writing.
– The importance of correctness is possibly more sought in writing
– Pressure to keep the communication going is more important in speaking

 Assessment of performance

Sue used an analogy while explaining the assessment of performance. One’s performance is a result of many underlying competences that you try to infer can be compared.

You can see the performance as the tip of the iceberg.

Sue indicated that language assessment task types should resemble real world task types. She also said that productive test should involve tasks that meet the following conditions.

Reliability
Construct validity
Authenticity
Interactivity
Impact
Practicality/ feasibility

Then the audience commented on the conditions they found interesting.

Some comments are as follows;

 – All are equally important but I would choose construct validity if I had to choose one.
– We don’t live our lives on a multiple choice basis so the task should be a real life tasks that can engage the learners.
– Authenticity must be what learners will need to do in real life.

Assessing writing

Sue mentioned the following aspects to consider while planing, preparing and assessing writing tests:

– Genre
– knowledge of audience
– context
– task completion and relevance
-cognitive complexity
– the process of production planing, drafting, editing, finalizing
– syntactical pattern and linguistic complexities
– Coherence and cohesion

Barry made a comment when one of the audience asked about the role of task completion in written tests; Task completion depends on the purpose, i.e. If the task is about your last holiday and if the learner has shown enough language for me to assess his performance, it doesn’t matter if he has stopped talking about his summer on 3 August.

Sue suggested the following contents for specifications for speaking tests referring to Alderson et al(1995) in Sari Luoma, Assessing Speaking 2004 CUP.

-The more time you spend at this stage the more ti e you will save from the prep stage
-The test purpose
-Description of examinees
-Test level
-Definition of construct
-No of sections and appears
-Time for each section and paper
-Weighting for each section paper
-Target language situation
-Text types
-Test methods
-Rubrics
-Criteria for marking
-Description of typical performance at each level
-Description of what candidates at each level can do in the real world
-Sample papers and samples of learner performance on task

The session ended with Sue promising to get into the details of these in the afternoon session.

The next session was on Assessing Performance in Speaking and Writing by Barry O’Sullivan.

Barry started by saying that there isn’t a perfect way of assessing productive skills

He shared a few traditional rating scales, both holistic and analytical ones and mentioned the history of them. Following these scales, he shared some new ideas in creating tests and scales. The following chart shows that all the criterion were not tested in each task. Test makers cherry picked the criterion that fit best with the task requirements and just assessed these at certain tasks as shown below.

Barry commented on the fact that different task types require different scales. For example, when a public speaker gives a talk it requires a one-way, formal, long-turn but the same speaker may have a conversation at the break time with the other participants. That talk is interactive, accuracy becomes less important and that both parties are responsible for the communication. Using the same rating scale for both tasks is simply wrong.

Barry claimed that spoken grammar is not the same in an interactive task as opposed to a monologic task, and he said that in an interactive task, he would get rid of all the accuracy components because once you are listening for errors as an examiner you lose the plot in an interactive speaking tests.

He extended his discussion on the same issue by mentioning the idea of fluency: in a monologue fluency is the ability not to stop, to basically keep going. However, in an interaction it is different because interaction fluency is different. Interactive language doesn’t belong to one person it is co-constructive. We build the conversation together.

He suggested to try speaking to somebody without making any reference to the other speaker and see what he meant.

Barry gave some statistics;

– a lot of examiners rate the test taker in 3 seconds, which means before they even start speaking! He commented that This is scary.. Then he underlined the importance of rater training. Barry also mentioned the importance of finding markers that can do the marking fast and consistently.

-Average time taken by an examiner to mark 250 words: 57 seconds.

According to Barry, even though technology helps examiners in our day and age, machines cannot assess how well an idea has been communicated, they can only count well. He gave an example: If Ernest Hemingway or James Joyce was tested by machines, their English might have been interpreted at the level of A1 or A2. But humans can understand how well their work is brought together to be masterpieces, machines cannot get this nuance.

Finally, Barry listed the following suggestions;

– Score every task ( however difficult)
– Think about what criteria suit each task
– Double mark everything
– Train raters (ideally for internal consistency)
– Establish evidence of the meaning of score
– Monitor rater performance in real time
– Analyze test data

After lunch, participants were put in 8 different groups to share experiences in writing and speaking testing. Sue, Zeynep and Barry monitored the groups and joined the discussions.

At 15.15, the groups got back together and started to ask the following questions to Barry, Sue and Zeynep.

Wrap up session

Q: While we are testing students we give them the criteria which they are familiar with, we give the criteria on the exam paper. Is that Ok?

A: Yes, so long as it is similar to the one that they are familiar with, that is fine.

Q: Is asking students to write 5 paragraph essays in prep programmes realistic and meaningful? Is there a value of teaching organization skills?

A: The issue of asking students to write essays was been discussed. How much time is spent on the introductions and conclusions which are not exactly represented at all in the CEFR were mentioned.

Q: A: Idea development to support what you want to express, a little bit like paragraph writing should be enough.

Zeynep mentioned the research they conducted at Sabanci University to discover the needs of the departments while revising their writing curriculum. They found out that the departments asked questions as follows;

-What is ….?
-What is the difference between x and y?
-Draw conclusions about the the topic of…

That is, departments definitely didn’t need or want 5-paragraph essays.

So Sabanci University started to train their students accordingly and test them similarly. Students have to follow content-based English classes and the writing questions comes from this content, either as short texts or longer responses. This way, they ensure attendance as well.

Q: How many people should mark a writing piece

A: At least, two. Following a standardization session.

Q: Sometime weak and strong learners are paired up in speaking tests. Is that fair?

A: In interactive tasks, higher achievers are disadvantaged when matched with a lower achiever. It would be worthwhile to tag students and match them with candidates with similar competence

Q: When the teacher is the only interlocutor, is there a good way to set up the test?

A: You need to change the power relationship between the examiner and the test taker.

You could ask students to prepare a topic from some core topics and come to the exam day. The test taker becomes the expert. In such situations, questions cannot be scripted, untrained examiners cannot manage this. That is the weakness of this kind of exam. A huge amount of training is necessary.

Q: How can we get round to mark students when personality factors affect the interaction. For example, a dominant person leading the task.

A: Best way to get round to it is that you need to give different tasks to the candidates. So if one candidate is disadvantaged in one of the tasks, s/he can be advantaged in another.

Leave a comment