非英语专业大学生议论文写作中词块使用研究
--一项基于贵州大学非英语专业大学生写作语料库的研究
Chapter One Introduction
1.1 Research Background
The most important reason for teaching writing to students of English as a foreign language, of course, is that it is a basic language skill, just as important as, listening, speaking and reading. For EFL learners, the using of words plays an important role in effecting prominent writings. Michael Lewis (1993) proposed the term lexical chunks. After Wrary (2000) put forward the deficiency of words used by EFL learners and advocated the significance of learning the conbination of words, lexical chunks have attracted the interest of many researchers. In recent decades, many experts have carried out extensive studies relate to lexical chunks to satisfy the improvement of those EFL learners’ English writing proficiency. While some emphasize the problems lie in the using of lexical chunks among college students, such as Chen (2008) claims the importance of leading students to use lexical chunks properly in their writings. Others focus on the types of structures of lexical chunks used in college students’ writing skills, for example, Xu (2010) describes and discusses the usages of lexical chunks in non-English majors in their writings.
Lexical chunks play an important part in both acquiring and performing language. Just as the opinion of Ding and Qi (2005), more empirical studies will be needed to support the view that there is significant correlation between learners’ lexical chunks competence and their written performance. However, with the development of the corpus linguistics, there have been numerous natural language materials and many powerful corpus software emerged. Thus large-scaled corpora have been built both home and abroad. For instance, BNC(British National Corpus), ANC(American National Corpus), LOCNESS(Louvain Corpus of Native English Essays), CLEC (Chinese Learners’ English Corpus), COLSEC(Chinese Learners’ Spoken English Corpus), PACCEL(Parallel Corpus of Chinese EFL Learners) and so on.
Based on these corpora, many researchers begin to implement further studies by using lexical chunks as data base in corpus linguisticson on the writing skills among Chinese EFL learners. For example, Wei (2007) studies the phrase characteristics in Chinese EFL learners’ spoken English in COLSEC. Qi and Ding (2011) make comparative analysis of lexical chunks used between American and Chinese students’ spoken English. Wang and Liu (2013) study lexical chunks used in standpoints in academic writings of Chinese EFL learners. This present study will also analyze how lexical chunks are used in EFL learners’ writings.
However, the large-scaled corpora applied in many studies were built a few years ago. For example, CLEC(2003), COLSEC(2005) and PACCEL(2008). The features of lexical chunks used by students now are probably not the same as those in the past few years. So the corpus will be built in this study was selected from an online English writing contest in 2015 held by the website called www.pigai.org. The competitors were about one million and over three thousand of them are non-English majors from Guizhou University. Thus, the up-to-date and massive amount of the natural language materials makes it possible to be the critical ingredients to fulfill the needs of the present study. Based on the writings completed by the students in Guizhou University, the author of the present study will build a corpus named Written English Corpus of Guizhou University Learners ( WECGUL for shorts ) and make a corpus-based analysis on the use of lexical chunks of non-English majors in Guizhou University.
1.2 The Purpose of this Study
With the introduction of lexical chunks and corpus linguistics, first of all, this study reveals the importance of lexical chunks and corpus lie in English teaching and learning.
Secondly, the present study focuses on discovering the overall features of lexical chunks used in EFL learners of non-English majors of Guizhou University. Besides, the author believes that it is worth making a correlative analysis of the lexical chunks between high score writings and low score writings.
Finally, on the basis of the corpus-based study, this thesis will also present a true picture of the influencing factors in the process of using lexical chunks in EFL learners’ writings by conducting and analyzing a questionnaire survey and an interview.
1.3 Significance of the Study
The significance of this study lies in a few aspects: first, with the help of large-scale corpus, the results of the study will be more objective. Besides, the present study intends to offer a detailed description of the features of EFL learners’ usages of lexical chunks after analyzing the structural features of lexical chunks used the students. Moreover, this study also stresses on the analysis of the correlative analysis of lexical chunks used between the high score writings and low score writings of EFL learners in Guizhou University. Finally, according to this study, for teachers, they may have clear views on students’ writing abilities, which will definitely help them to identify and refine their problems or difficulties in teaching English writings. For, students, they may have clear minds about using lexical chunks in performing English writings. All in all, it is hoped that the present study will bring some insights to both English teachers and learners in improving students’ English writings.
1.4 Organization of the Study
Based on the previous studies, in order to make a corpus-based study on the use of lexical chunks by students in Guizhou University, this present study is designed as follows:
Chapter one is the introductory part of the study, which provides the research background, purpose and significance of the study. Then, it offers the overview of the present study.
Chapter two presents relevant literature review for the study. Firstly, it involves some basic conceptions of lexical chunks such as definitions, classifications, and functions of lexical chunks. Then, it includes the development of corpus and corpus linguistics. The last part of this chapter gives the relevant study on lexical chunks from two aspects. One is the previous study on lexical chunks, and the other is the previious study of corpus-based studies of lexical chunks abroad and home.
Chapter three describes the design and methodology of the study. It mainly emphasizes on three aspects: 1) the research questions and participants; 2) instruments used in this study; 3) the steps for data collection and analysis.
Chapter four is the major part of the study, which provides the specific descriptions and results on the usages of lexical chunks used in EFL learners in Guizhou University. It also includes the results from correlative analysis on high score writings and low score writings.
Chapter five contains the conclusions based on the major findings. According to the findings obtained in the above analysis, the author gives some useful explanations and descriptions on lexical chunks for teaching and learning. Besides, the limitations of the present study as well as suggestions for further English teaching and studying are also listed in this part.
Chapter 2 Literature Review
2.1 Theories on Lexical Chunks
In this chapter, firstly, some relevant theories and studies on the development of lexical chunks and corpus linguistics are introduced. It begins with the introduction of lexical chunks, which includes its definitions and classifications. The second section will present a review of studies on corpus and corpus linguistics. Finally, the relevant studies of lexical chunks and corpus linguistics abroad and home are presented.
2.1.1 Definition and Characteristics of Lexical Chunks
Lexical chunk was first proposed by Becker in 1976. It was defined as a special multi-word phenomenon existing between traditional grammar and lexicon. Later, there were many names for lexical chunks, such as “fixed and semi-fixed expressions” (Fraser, 1970), “prefabricated routines and patterns” (Brown, 1973), “holophrase” (Corder, 1973), “praxons” (Bateson, 1975), “gambits” (Keller, 1979), “lexicalized sentence stems” (Pawley and Syder, 1983), “lexical phrases” (Nattinger and Decarrico, 1992), “prefabricated or semi-prefabricated Chunks” (Bollinger, 1975), “multiword chunks” (Lewis, 1993). Among these terms, the most commonly used are lexical chunks. Michael Lewis (1997) stated “lexical chunk is like an umbrella term which includes all the other terms”. Biber (1999) classified the mufti-word expressions into four categories, including idioms, collocations, lexico-grammatical assocation and lexical bundles. Some representative definitions of lexical chunks are as follow:
Jespersen (1976) stated that free expressions are created in different cases while chunks are drawn from human memory. These chunks refer to a whole sentence, a phrase, a single word, or even only part of a word.
Pawley and Syder (1983) are the researchers to realize the significance of “sentence-length expressions”. That is to say, lexical chunk is described as “a unit of clause length, and its fixed elements form a standard label for a culturally recognized concept”.
Nattinger and DeCarrico (1992) emphasized the pragmatic function of lexical chunks. They insisted that lexical chunks are reserved and retrieved as a whole rather than as individual words.
Biber et.al (1999) studied lexical chunks by a quantitative approach, which was based on corpus data from Longman Grammar of Spoken and Written English. He found out that lexical chunks are embodied with a feature of “occurring at least 10 times per million words in the corpus”. According to this feature, Biber (1999) further classified lexical chunks for speech into 14 categories.
Wray and Perkins (2000) pointed out a definition, which likely to be significant. They defined a lexical chunk as “a sequence, continuous or discontinuous, of words or other meaning elements, which is or appears to be, prefabricated”.
Altenberg (1994) proposed his definition: any continuous string of word-combinations occurring more than once in the identical form. The present study will mainly apply Altenberg’s classification.
In summary, no matter what terms do the researchers used, they are defining the same linguistic phenomenon. Different researchers just choose the most suitable term to meet their own specific research needs. However, according to the various definitions mentioned above, the lexical chunks applied in this study refer to the fixed or semi-fixed formulaic frames between traditional grammar and lexical items. These chunks can be stored and easily retrieved as a whole from human memory without being subject to analyzed by grammar.
2.1.2 Classification of Lexical Chunks
However, there is no universally accepted classification for lexical chunks since the various definitions of lexical chunks. Among them, structural and functional aspects are two main perspectives on the classification of lexical chunks.
For structural perspectives, the classifications proposed by Lewis, Nattinger and Decarrico seem to be extensively accepted.
Nattinger and DeCarrico (1990) collected amount of spoken and written discourse from abroad corpus, they carried out a detailed classification based on form and functional lexical phrases. First, poly-words, which are short phrases such as “as you know”. Second, institutionalized expressions such as “it’s very kind of you”. Third, phrasal constraints like “give to”. Last, sentence builders such as “it has been acknowledged that”.
Lewis (1997) identifies chunks as five types. The first are words. He suggested that the words have been considered as separate units. The second are poly-words such as collocations or word partnerships. The third are institutionalized utterances. The last are sentence frames and heads.
Becker (1975) pays more attention to the use of lexical chunks in spoken English and written English. He believes that phrases should be treated like the building blocks for the formation of discourse. He also introduced the innovative concept of word phrases and ways. Becker divides the lexical chunks into six types. They are poly-words, phrasal constraints, meta-messages, sentence builders, situational utterances, verbatim texts.
Altenberg (1998) points out the differences between the function and form. He worked out three categories of the lexical chunks from the perspective of grammar. The first type is incomplete phrase, which are generally word sequences without a content word, such as a great number of. The second type is clause constituents such as it can be concluded that. The third type is full clauses. These kinds of chunks have the relatively complete sentence structures and they are often stored as a whole in people’s mind, including independent and dependent clauses. For example, as you know. The detailed classification of lexical chunks made by Altenberg (1994) is listed in the following table 2.1.
According to Altenberg’s classifications of lexical chunks as full clause, clause constituents and incomplete phrases, the author in this present study proposes more specific distinctions on full clause, clause constituents and incomplete phrases. First, the full clause, the lexical chunks must be completed English sentences, which have subjects and objectives. Secondly, the clause constituents should include subjects and an objectives, and could not exist independently without being in a full clause. Lastly, incomplete phrases should involve completed word groups.
2.1.3 Functions of Lexical Chunks
It has been widely acknowledged that lexical chunks play a great pragmatic role in language communication. Many researches do a large number of studies on the pragmatic functions both in spoken and written language. For example, Pawley and Syder (1983), Nattinger and DeCarrico (1992) and Rosamund Moon(1998).
Nattinger and DeCarrico (1992) divided the functions of formulaic expressions into three groups. They are social interactions, formulaic language frequently asked by the students and course devices, which means to connect the meanings and the structures of discourse.
Altenberg (1994) discusses six pragmatic functions: qualifiers, quantifiers, conjunctions, temporal expressions and spatial expressions.
Moon (1998) describes the five main pragmatic functions of lexical chunks specially in writing aspect. They are informational function, evaluation function, situation function, moralizing function and organizational function.
Wray (2002) conducted that lexical chunks can contribute to language fluency and accuracy. From his consideration, using lexical chunks can not only help to save processing effort but also to regulate production and maintain a particular rhythm and flow to help speakers speak fluently (Wray, 2002, p. 75).
According to the functions mentioned above, in the present study, the pragmatic functions of lexical chunks in self-built corpus will be analyzed in Chapter four and five.
2.2 Studies on Lexical Chunks
Previous studies on lexical chunks mainly from two aspects: the study related to theories in the early stages and the study related to corpus in resent years.
2.2.1 Studies on Lexical Chunks Abroad
In the early stages, most of studies focus on the constructions of theoretical framework. There are a number of terms used to refer to lexical chunks, such as “prefabricated routines and patterns” (Brown, 1973), “formulaic utterances” (Fillmore, 1976), “formulaic speech” (Ellis, 1985), “lexical phrases” (Nattinger&DeCarrico, 1992), “lexical chunks” (Lewis, 1993), “recurrent word combinations” (Altenberg, 1998), “formulaic sequence” (Wary, 1999). Becker (1975) argues that phrases consisting of more than one word should be treated as chunks for the formation of utterances. Altenberg (1998) sets up a framework and criterion on structure categories and pragmatic functions of lexical chunks.
Besides, as so many researches have done on the definitions and classifications of lexical chunks. For example, Pawley&Syder (1983) pointed out that mature native speakers always store many complex lexical chunks in their mind. Al-Zahrani (1998) did a similar on the relationships between the knowledge of lexical chunks and students’ writing proficiency and found that there is a significant difference of the use of lexical chunks in students’ different levels and the number of lexical chunks increases with the students’ academic years grown. Altenberg (1998) builds a framework and criterion on structure categories and pragmatic functions of lexical chunks. Howarth (1998) finds that 31-40% of the lexical chunks belong to the group of collocations and idioms by examining 238, 000 words of academic written texts.
Altenberg&Granger (2001) find that about 70% daily oral expressions are composed of these prefabricated lexical chunks. Wiktorsson (2003) investigates the lexical chunks used by Swedish English learners in their writing.
2.2.2 Studies on Lexical Chunks Home
In the past decades, lexical chunks have become a hot topic in second language acquisition research field and studies on the lexical chunks have reached a fever pitch since 2002. The domestic studies mainly focus on theoretical research and review research of lexical chunks.
In recent years, the empirical studies in lexical chunks are strengthened home. Liu (2003) finds that the lexical chunks are the important part of language production in Chinese non-English major college students’ written English. Yan (2003) showed that the lexical chunks should be systematically incorporated in the curriculum of second language teaching. Diao Linlin (2004) made a survey of Chinese English majors’ chunk competence. Wang and Zhang (2006) study the use of chunks in Chinese learners’ English argumentative writing based on SWECCL corpus, which is the corpus of Chinese undergraduate English-major. Liu (2008) carries out that empirical research on the influence of formulaic language on students’ oral ability. Ma (2009) investigated the use of lexical bundles in the timed essays of Chinese English-major undergraduates. Su (2010) studies the correlations between the verb-centered lexical chunks and the development of second language writing accuracy, complexity and fluency. Wei and Lei (2011) investigated the use of lexical bundles in the academic writing of English-major doctors. Four-word lexical bundles was identified and analyzed.
2.3 Studies on Corpus and Corpus Linguistics
2.3.1 Corpus
Thomas and Short (1996) said corpus has become a mainstream. Kennedy (1998) points out that corpus is a body of written text and be viewed as a basis for linguistic analysis and description. Taking the state of corpus into account, there are two classifications of corpus: diachronic corpus and synchronic corpus. In addition, there is another pair of classification: open corpus and closed corpus. Now, there are general corpus, specialized corpus, spoken corpus, written corpus, native speakers’ corpus, learner corpus, monolingual corpus, parallel and multilingual corpus.
2.3.2 Corpus Linguistics
Corpus linguistics is a language discipline directly related to corpus. The corpus studies connect quantitative methods with qualitative methods to research the linguistic phenomenon. McEnery&Wilson (1996) claim that corpus linguistic is a method to verify the language hypothesis on the basis of data, and it is a language description beginning with data. Sinclair (1997) claims that corpus linguistics is the study of the study of language simply by corpus-based, but it differs from the system in their language studies on the use of authentic examples adhere to the traditional linguistics. Biber et al. (1998) regarded the corpus linguistics studies as “association patterns”. John Sinclair (1998) points out that the reason is that speakers do not have access to the subliminal patterns which run through a language. Cermak(2003) claims that it seems obvious now that highest and justified expectations in linguistics are closely tied to corpus linguistics. Teubert (2005) points out that the corpus is considered the default resource for almost any working in linguistics.
2.4 Previous Corpus-based Studies of Lexical Chunks at Home and Abroad
2.4.1 Relevant Corpus-based Studies of Lexical Chunks Abroad
From the 1980s on, it begins with the third period of corpus linguistics. It developed fast and it was flourishing as never before. It explored many new fields, such as BNC, ANC, LLC and LOCNESS, DDL (Data Driven Learning), CIA (Contrastive Inter language Analysis), and MCALL (Mufti-media Computer Assisted Language Learning).
2.4.2 Relevant Corpus-based Studies of Lexical Chunks Home
In 1993, the Language Center of Hong Kong Technology University established a Computer Science Corpus. Gu Yueguo (1998) and Li Wenzhong (1999) contribute a lot to the corpus-based study on foreign language teaching and learning. In 2003, Gui Shichun, the professor of Guangdong Foreign Studies University, in his cooperation with Yang Huizhong, the Shanghai Jiao Tong University, have built CLEC (Chinese Learner English Corpus) and COLSEC (Chinese Learners' Spoken English Corpus). CLEC has a great value on studies in inter language of Chinese English learners. This corpus totally contains of 1,100,000 English words, consisting of written texts from high school students, college students of non-English majors and English majors. SWECCL (Spoken and Written English Corpus of Chinese Learners) is built by the Professor Wen of Nanjing University and in collaboration with Foreign Language Teaching and Research Press. This corpus contains of 200, 000, 0 words. It involves two sub-corpora: WECCL (Written English Corpus of Chinese Learners) and SECCL (Spoken English Corpus of Chinese Learners). The data of WECCL and SECCL is gathered from spoken section in Test of English Majors (TEM-4).
Chapter 3 Research Design and Methodology
In the first section of this chapter, the author reviews the research objectives and questions. In the second section, the author gives an introduction of the corpus used. In the third section, some software instruments retrieved in the present study are listed in details. In the fourth section, the specific research stepsare presented. In the last section, the author gives a description on the process of data collection.
3.1 Research Questions
Based on the previous studies of the lexical chunks in written English and the application of the corpus-based approach, in this study, the author investigates the structural features and correlative analysis of lexical chunks used in non-English major students’ argumentative English writings and answers the following questions:
1. What are the structural features of lexical chunks used in WECGUL?
2. Is there any correlation between the using of lexical chunks and the scores of the writings?
3. What are the main factors influencing the use of lexical chunks by non-English majors in Guizhou University?
3.2 Participants
In this study, the participants are students of non-English major from Guizhou University who took part in the online writing competition. Firstly, the competitors include freshmen and sophomores. The number of these competitors is over three thousand. Secondly, according to their Student IDs, the population for the questionnaire is three hundred who are sophomores and juniors since the competition was held in 2015. All of them will be chosen by random sampling. Lastly, there are also thirty students chosen from the questionnaire interviewed randomly by the author for the interview investigation.
3.3 Research Methodology
3.3.1 Research Instruments
3.3.1.1 Corpus used
In this study, the author adopts the self-built corpus named WECGZUL, which was set up from nationwide writing competition held by a well-known English writing website called www.pigai.org. The theme of this activity is Millions Compositions of a Same Title, “We are What We Read”, which caters rightly for National Reading Month in April. During two-month’s time of the contest, the website had received over a million of compositions sent by high school and college students after revising again and again. Among them, more than three thousand are from Guizhou University.
3.3.1.2 Software used
Antconc version 3.2.2 is a freeware corpus analysis toolkit for concordance and text analyzing, was designed by Lawrence Anthony, who was from Waseda University in Japan. It can produce concordance distribution plot and KWIC (key-word-in-context) concordance lines, and analyze word clusters, collocates, and word frequencies. Besides, it also has some useful tools to analyze word clusters. For example, the “Concordance”, “Concordance Plot”, “File View”, “Clusters”, “N-Grams”, “Collocates”, “Word List” and “Keyword List’. The Word List tool is applied to count the total number of words appears in the corpus, according to this result, the type-token ration of WECZUL can be worked out. The Word Cluster tool is applied to output a list of ordered words in the left frame of the main window that emerged in the target files. Large files are consisted by many clusters of different lengths. The author can choose the minimum and maximum length (number of words) in every cluster. Meanwhile, the minimum frequency of clusters can be presented. The present study will apply the word cluster tool to get the 3-6 words lexical chunks.
The Chinese version Text Editor 3.0 downloaded freely from on the Internet. It is a powerful and versatile text editor or word processor, which is designed to make text editing as convenient as possible.
SPSS (Statistical Package for the Social Science)13.0 is frequently used for data analysis, which is conducted to make a correlation analysis of the usages of lexical chunks and the results from both questionnaires and interviews.
3.3.1.3 Questionnaire
In order to obtain the influencing factors which inhere in the use of lexical chunks by students from Guizhou University, a questionnaire will be chosen as one of the instruments in the present study. The reasons are as follows: firstly, the versatility of questionnaire enables this investigation to gain much information about the participants. Secondly, questionnaire survey is relatively effective, because it is quick, economic and efficient to collect questionnaire data. Thirdly, the responses are gathered in a standardized way so that the result is more objective. Lastly, it is easy for the author to analyze the result of the questionnaire by using statistical tools such as SPSS, excel etc.
3.3.1.4 Interview
In this present study, the interview will be adopted to make up deficiencies caused by the questionnaire investigation. In an interview, the author of this study and the participants could communicate with each other directly. Additionally, the interview can also provide deep information for specific questions.
3.3.2 Data Collection
There are several steps in the process of making a self-built corpus, such as collecting the raw texts, reorganizing and cleaning the texts, and classifying the texts. In order to establish a self-built corpus in the present study, the instruments used are the USB flash drive or portable hard disk with a large capacity, the text program in Microsoft Windows, and the Chinese version Text Editor 3.0.
In order to the research question 1, the author uses Antconc version 3.2.2 to extract the 3-6 words lexical chunks and their frequency in WECGUL.
The tool of Word List and N-Grams are used to work out the type-token rations and to select 3-6 words lexical chunks. The selecting process of extracting 3-6 words lexical chunks will be introduced in the research process.
3.3.2.1 Steps for making WECGUL
Based on the compositions provided by the English writing website called www.pigai.org, the author establishes self-built corpus named WECGUL to answer the research questions. Moreover, there are some important procedures for making WECGZUL.
Step1 Clearing raw texts by Text Editor 3.0
It is indispensable for the builder to make sure that the raw texts should be clean and qualified. In other words, the part of data analysis will be strongly influenced by the unqualified raw texts. The example of unqualified raw text is presented in figure 3.1 in Appendices.
A qualified raw text should be like these: first, there should not be any space between words and sentences. Second, there should be no line breaks between two paragraphs. With the help of the Text Editor 3.0, the author could be sure about the changes of unclean texts in to qualified ones that cater for data analysis in the next sections. Although the step of clearing the raw texts could be done by applying the Text Editor 3.0, it is still a time-consuming work since the metadata are over three thousand. The example of reorganized text is presented in figure 2.2 in Appendices.
Step 2 Set up self-built corpus named WECGZUL
As the natural language materials are formed in a large text in Microsoft Windows, the author builds texts and files for each composition. The name of each file consists of student’s score and student ID so that it is convenient for the author to analyze and employ the software. It is a long time to achieve the setting of a self- built corpus in that the number of the compositions are over three thousand. After completing the two procedures above, a self-built corpus named WECGUL was completed to meet the demands of the study. It takes a long time to make a self-built corpus.
Step3 Nam the files
Firstly, with the compositions finished by more than three thousand undergraduates, the author builds more than three thousand files by hand according to the students’ scores and student IDs. Secondly, a series of software instruments are employed to make the raw text in to the target text. The process of using these instruments will be detailed in the following section. At last, the author extracts 3-6 words lexical chunks and frequently appeared l, exical chunks in WECGZUL.
3.3.3 Data Analysis
The present study adopts a corpus-based approach, which means that all the descriptions and analysis are based on a large number of lexical chunks. Both quantitative and qualitative approaches are used for data analysis.
As for the research question 1, quantitative method should be frequently used to analyze the data collected from WECGZUL. For example, the type-token ration can present the richness and diversity of words in WECGZUL. According to observe the richness and diversity of words used in students’ writings, the author could make an overall view on the management of English words lies in writings of students in Guizhou University. Also, by using the software Antconc, the author could select 3-6 high frequency lexical chunks used by students in Guizhou University. Then, with the data of high frequency lexical chunks appeared and classification of lexical chunks made by the author in the second chapter, it could be possible for the author to analyze the structural features of lexical chunks used by students in their argumentative English writings.
The second research question will also include quantitative study method to obtain the correlation of 3-6 lexical chunks between the high score writings and low score writings in WECGUL. In the same way, the author starts with working out the type-token ratio to observe the richness and diversity of words used in the high score writings and low score writings. Next, by using the software Antconc, the author could select 3-6 high frequency lexical chunks used by students of the two groups. Then, the author could make a comparative overview of lexical chunks used between students of high and low scores. This contrast step of analysis will make great deal of sense in studying list the distinctions between the two groups. For the medium group students, the author will not take them into account since the medium group is less valuable to do researches than the high score writings and low score writings. At last, the author will conduct SPSS to make correlative analysis between the high score writings and low score writings.
The third research question is also indispensable in that it is important for the author to know the aspects that made the different use of lexical chunks of students in Guizhou University. In order to answer this research question, the author will employ both quantitative and qualitative methods to analyze and obtain the possible causes. At first, the author makes a questionnaire and give out to the students took part in the writing contest. Besides, a face-to-face interview should be conducted for making up deficiencies that cannot solved by questionnaire. Then, with the data results from the questionnaire and interview, the author will makes qualitative analysis on the factors that influence the different use of lexical chunks of students in Guizhou University. According to the combination of the classifications of Altenberg(1998) on lexical chunks with the results of quantitative method, the author will make qualitative analysis on the results of lexical chunks used in students of Guizhou University in their argumentative writings.
3.3.3.1 Steps for withdrawing the clusters to answer the research question 1
Step1 Make a word list of WECGUL
In order to make an overall view of WECGZUL, the author applies Word list in Antconc to make the word list. The specific steps are as follow:
1. Open Antconc and choose the files.
2. Click the “Word list” button to operate the software.
By doing this, the word list from the compositions in WECGUL can be gained. It can be seen in the following figure 3.3 in Appendices.
The type-token ration is regarded as a measure of vocabulary richness in a corpus. The type is the relationship between the number of different words, and the token is the total numbers of words. The type-token ration of a high figure implies the richness and diversity of words. By doing this, it could help with the analysis of vocabulary richness and diversity of words of non-English majors in Guizhou University.
Step 2 Operate Antconc for selecting lexical chunks
In order to get the most frequent used 3-6 words lexical chunks in WECGUL, the N-grams function, which is a part of word cluster of Antconc 3.2.1, should be applied. The specific steps are as follow:
1. Open Antconc and choose the files, and click the “cluster” button.
2. Choose N-grams, and change “Min size” and “Max size” into 3 respectively.
3. Click “start” button to process the soft ware, and the result of 3-word can be seen in the following figure 2.4 in Appendices.
The ways of getting the 4-6 words lexical chunks are nearly the same as that of getting the 3-word. One different step is that the number of the “N-grams Size” should be 4 for 4-word chunks, 5 for 5-word chunks, and 6 for 6-word chunks. Consequently, the author can obtain the 4-6 words clusters presented in figure 3.5, 3.6 and figure 3.7.
Step 3 Select 3-6 words lexical chunks
As the results of 3-word chunks, 4-word chunks, 5-word chunks, and 6-word chunks are presented by the N-grams, the author selects the most frequently used lexical chunks in the texts. In this step, the author needs to build another four files to store the most frequently used lexical chunks of 3-word chunks, 4 -word chunks, 5-word chunks, and 6-word chunks.
Step 4 Criteria and Manual Selection
According to the definitions of lexical chunks made by the author in Chapter two, there is a need to cut off the non-chunk words. For example, the word sequences as “are what we” or “of what people living”, which are incomplete in form and meaning. Meanwhile, it is a long time to select the useful lexical chunks by hand. The author should do another work to compete this step. Then, the author will be able to work out the type-token ration of 3-word chunks, 4 -word chunks, 5-word chunks, and 6-word chunks respectively for analyzing the lexical density of each lexical chunks used by students. By doing this, it could help with the analysis of vocabulary richness and diversity of 3-6 words lexical chunks of non-English majors from Guizhou University. Lastly, the author will divide the useful lexical chunks into full clause, clause constituents and incomplete phrases, which have been mentioned in 2.1.2. After compeleting the steps above, the author could obtain the structural features of lexical chunks used in non-English majors from Guizhou University
3.3.3.2 Steps to answer the research question 2
Since the names of the files in WECGUL are in the form of students’ scores and student IDs, it is easy for the author to divide the high score writings and low score writings. Thus, another two files should be built for collecting the texts of both high score writings and low score writings.
Step 1 Make a word list of high score writings and low score writings
Before conducting to make a word list of high score writings and low score writings, the author should work out the groups of high score writings and low score writings. The high score writings is defined by calculating the total participants in WECGUL. Then use the total number to multiply 27%. Then, the two score groups will be regarded as the high score writings and low score writings. The specific steps are nearly the same as above.
1. Open Antconc and choose the files.
2. Click the “Word list” button to operate the software.
Then, the word list from the high score writings and low score writings can be gained. According to the word list, the type-token ratio of high score writings and low score writings in WECGUL can be worked out to imply the richness and diversity of words.
Step 2 Operate Antconc for selecting 3-6 words lexical chunks in high score writings and low score writings
Concerning the research question 2, in order to correlative analysis on the use of 3-6 words lexical chunks by students from Guizhou University, the N-grams function, which is a part of word cluster of Antconc 3.2.1, should be applied. The specific steps are as follow:
1. Open Antconc and choose the files, and click the “cluster” button.
2. Choose N-grams, and change “Min size” and “Max size” into 3 respectively.
3. Click “start” button to process the soft ware, and the result of 3-word can be seen in the following figure 2.4 in Appendices. The ways of getting 4-6 words lexical chunks are nearly the same as those of 3-words. One different step is that the number of the “N-grams Size” should be 4 for 4-word chunks, 5 for 5-word chunks, and 6 for 6-word chunks. Consequently, the author can obtain the 4-6 words clusters presented in figure 2.5, 2.6 and figure 2.7.
Step 3 Select useful 3-6 words lexical chunks in high scores and low scores
After obtaining the 3-6 words lexical chunks from the high score writings and low score writings, the author select the useful 3-6 lexical chunks and cut off these non-chunks ones. Then, it is time to complete the manual work to set up files and texts for 3-6 words lexical chunks respectively. Based on the steps above, the author could make word lists for 3-6 lexical chunks and figure out type-token ration for each lexical group. By completing this step, the author could be sure about the differences of vocabulary richness and diversity used by the two groups. Thus, by using SPSS 13.0, it is possible for the author to make both comparative and correlation analysis of high score writings and low score writings.
Step 4 Analyze the differences between high scores and low scores
It is predicable and obvious that the capacity of words in the high score writings and low score groups is different. However, it is also indispensable to make a study on the frequency of different lexical chunks and structural categories between them. Therefore, the author takes contingency table in Chi-Square Test to calculate the Chi-square test and p values. To be more specific, when p >0.05, it means that there is no difference, when 0.01≦ p ≦ 0.05,it implies that there is difference, when p ≦ 0.01,it means that there is significant difference between the high score writings and low score groups. At last, SPSS 13.0 is employed for the purpose of making correlation analysis of lexical chunks of the high score writings and low score groups.
3.3.3.3 Step 5 Methods to answer the research question 3
The questionnaire investigation
The questionnaire was adapted from other’s work (Yang, 2012; Jia, 2014) with some modification by the author. The questionnaire includes two sections: first, the basic information of the participants, which contains their gender and gender. Thus item1 and item 2 are included in this section. Second, have something to do with the students’ usages of lexical chunks, and item 3 to item 16 are included in this section.
The interview investigation
In this present study, there are three questions listed in the interview. During the interview, all the information will be recorded by the author with a smart phone which has a recording function for statistical analysis. After making a clear analysis of the results by conducting SPSS, it is possible for the author to have deep knowledge about the main factors that influence the using of lexical chunks by students.
References
Altenberg B. (1998). On the phraseology of spoken English: the evidence of recurrent
word-combinations. In: Cowie A P. Phraseology: theory, applications [M]. Oxford: Oxford University Press, 101-22.
Biber D, Conrad S, Reppen R. (1998). Corpus Linguistics Investigating Language Structure and Use [M]. Cambridge: Cambridge University Press.
Biber D, Johansson S, Leech G et al. (1999).Lougmau grammar of spoken and written English.[M]. London: Longman.
Conrad S, Biber D. T (2001).Variatious in English: Multi-Dimensional Studies[M]. London: Longman.
Comes V. Lexical bundles in freshman composition. In: Reppen R, Fitzmaurice S M, Biber D. (2002). Using corpora to explore linguistic variation [J]. Amsterdam: John Benjamins Publishing Company,131-145.
Granger S. (1998). Prefabricated Patterns in Advanced EFL writing Collocations and Formulae[J]. In: Cowie A P. Phraseology audApplicatious. Oxford: Clarendon Press, 145-160.
Kennedy G. (1998). Au lutroductiou to Corpus Linguistics[M]. London: Longman.
Milton J. (2000). Elements of a Written Interlanguage: A Computational and Corpus-based Study of Institutional Influences on the Acquisition of English by Hong Kong Chinese Students. In: Language Centre [J]. Research report II, Hong Kong: Hong Kong University of Science and Technology,1-125.
Nattinger J R, DeCarrico J S.(2000). Lexical Phrases and Language Teaching [M]. Shanghai: Shanghai Foreign Language Education Press.
Jeremy Harmer. (1998). How to Teach English [M]. Beijing: Foreign Language Teaching and Research Press.
Ken Beatty. (2003). Teaching and Researching Computer-assisted Language Learning. [M]. Beijing: Foreign Language Teaching and Research Press.
Viviana Cortes. (2004). Lexical bundles in published and
student disciplinary writing: Examples from history and biology. [J]. English for Specific Purposes, 397-423.
Shelley Staples, Jesse Egbert, Douglas Biber, Alyson McClair (2014). Formulaic sequences and EAP writing development: Lexical bundles in the TOEFL iBT writing section. [J]. Journal of English for Academic Purposes, 214-225.
Zahra Sadat Jalali, M.Raouf Moin. (2014). Structure of Lexical Bundles in Introduction Section of Medical Research Articles. [J]. Procedia - Social and Behavioral Sciences, 719-726.
Bianqi Sun. (2014). Corpus-based Chunk Application in College English Writing [J]. International Conference on Education, Language, Art and Intercultural. Communication. 217-222.
何安平(2010),语料库辅助英语教学入门[M]. 北京:外语教学与研究出版社。
梁茂成,李文中,许家金 (2013),语料库应用教程[M]. 北京:外语教学与研究出版社。
卫乃兴,李文中,渡建,梁茂成,何安平 (2014),变化中的语料库语言学[J]. 解放军外国语学院37(1): 1-8。
桂诗春,冯志伟,杨惠中,何安平,卫乃兴,李文中,梁茂成 (2010),与原料库语言学与中国外语教学[J]. 现代外语33(4):421-426。
赵晓临,卫乃兴 (2010),中国大学生英语书面语中的态度立场表达[J]. 外语研究119(1):59-63。
陈建生,范丽(2010), 基于语料库的中国学习者英语议论文词块研究[J]. 天津工程师范学院学报 20 (4):67-71。
刁琳琳(2004),英语本科生词块能力调查[J]. 解放军外国语学院学报[J]. 27 (4) : 35-38。
丁言仁,戚众(2005),词块运用与英语口语和写作水平的相关性研究[J].解放军外国语学院学报28 (3):49-53。
段士平(2011),基于语料库的英语学习者写作词块特征研究[J].郑州航空工业管理学院学报(社会科学版)30 (1):116-118。
桂诗春,杨惠中(2003),中国学习者英语语料库[M]. 上海:上海外语教育出版社。
刘晓玲,刘鑫鑫(2009),基于语料库的大学生书面语词块结构类型和语用功能研究[J].中国外语6 (2):48-53。
王伟,张红燕(2010),基于语料库的英语专业学生口语词块使用模式研究(英文)[J].语文学刊(外语教育与教学)(6) :1-3。
卫乃兴(2007),中国学生英语口语的短语学特征研究----COLSEC语料库的词块证据分析[J].现代外语30 (3):280-291。
卫乃兴(2003),中国学习者英语口语语料库初始研究[J].现代外语27 (2) :140-149。
文秋芳,丁言仁,王文宇(2003),中国大学生英语书面语中的口语化倾向------高水平英语学习者语料对比分析[J].外语教学与研究35 (4) : 268-274.
许家金,许宗瑞(2007),中国大学生英语口语中的互动话语词块研究[J].外语教学与研究39 (6):437-443。
许先文(2010),非英语专业研究生二语写作中的词块结构类型研究[J].外语界140 (5) : 42-47。
严维华(2003),词块对基本词汇习得的作用[J].解放军外国语学院学报26 (6) : 58-62。
杨惠中,卫乃兴(2005),中国学习者英语口语语料库建设与研究[M]. 上海:上海外语教育出版社。
杨惠中(2002),语料库语言学导论[M]. 上海:上海外语教育出版社。
张霞(2010),基于语料库的中国高级英语学习者词块使用研究[J].外语界140 (5) : 48-57。
张任东(2010),基于语料库的大学生英文写作中的词块研究[M]. 山西师范大学硕士学位论文。
肖忠华(2012),《语料库语言学:方法、理论与实践》述评[J]. 外语教学与研究 44(6):944-948。
谢爱红(2009),词块使用水平与英语写作成绩相关性研究[J]. 湖南农业大学学报(社会科学版)10(6):71-74。
丁言仁,戚焱(2005),词块御运用与英语口语和写作水平的相关性研究[J]. 解放军外国语学院学报 28(3):49-53。
徐先文(2010),非英语专业研究生二语写作中的词块结构类型研究[J]. 外语界 140(5):42-47。
方秀才(2012),基于语料库的英语教学与研究综述:成就与不足——根据22种语言学类CSSCI来源期刊近30年的统计分析[J]. 外语电化教学 145:19-24。
王芙蓉,王宏利(2015),基于语料库的语言学和工科学术英语词块比较研究[J]. 外语界 167(2):16-24。
梁燕,冯友(2004),近几十年我国语料库实证研究综述[J]. 解放军外国语学院学报 27(6):50-54。
刁琳琳 (2004),英语本科生词块能力调查[J]. 解放军外国语学院学报 27(4):35-38。
何中清,彭宣维(2011),英语语料库研究综述:回顾、现状与展望[J]. 外语教学 32(1):6-11。
马广惠(2009),英语专业学生二语限时写作中的词块研究[J]. 外语教学与研究(外国语文双月刊) 41(1):54-58。
杨滢滢(2014),英语专业学习者同一主题作文的词汇发展和词块运用特征[J]. 外语界 161(2):58-66。
卫乃兴(2009),语料库语言学的方法论及相关理念[J]. 外语研究 117(5):36-42。
桂诗春,冯志伟,杨惠中,何安平,卫乃兴,李文中,梁茂成(2010),语料库语言学与中国外语教学[J]. 现代外语(季刊)33(4):421-426。
Appendices
Figure 2.1 the Classifications of Lexical Chunks by Altenberg
Figure 3.1 the Example of Unclean Text
Figure 3.2 the Example of Reorganized Text
Figure 3.3 the Example of Word list
Figure 3.4 the Software Extraction of 3-Words Lexical Chunks
Figure 3.5 the Software Extraction of 4-Words Lexical Chunks
Figure 3.6 the Software Extraction of 5-Words Lexical Chunks
Figure 3.7 the Software Extraction of 6-Words Lexical Chunks
大学生英语词块知识及运用的问卷调查
亲爱的同学:
你好!这份问卷是通过调查贵州大学二、三年级在校生,了解影响学生在英语写作中使用词块的因素。希望同学们给与支持和配合,每个题的答案都无对错,请按照实际情况,选择最符合的选项(每个题目都只能有一个适合的答案)。你的意见非常宝贵,谢谢你的支持!(说明:词块是指一串预制的连贯或不连贯的词或其它意义单位,它整体存储在记忆中,使用时直接提取,词块可以被看作是语言交际和使用中的最小单位和主体)
一、个人基本信息
1.性别 A 男 B 女
2.你所在年级 A 大二 B 大三
二、词块使用
3.你在平时的英语学习中,对单个词的注重程度?
A很注重 B注重 C有点注重 D不注重 E很不注重
4.你在平时的英语学习中对词块的注重程度?
A很注重 B注重 C有点注重 D不注重 E很不注重
5.在阅读课文的时候,你有意识的注意短语或固定搭配?
A总是 B经常 C有时 D很少 E没有
6.你在记英语单词的时候会尽量把它的固定搭配以及怎样使用一起记。
A总是 B经常 C有时 D很少 E没有
7.你喜欢并经常背诵一些好的英语写作的范文。
A总是 B经常 C有时 D很少 E没有
8.你认为在英语写作中使用语块(如pay attention to...; more and more等)会对英语写作有所帮助吗?
A很有作用 B有作用 C有点作用 D没有 E 很没作用
9.你认为在英语写作中使用完全分句语块(如What should we do; We must try our best 等)会对英语写作有所帮助吗?
A很有作用 B有作用 C有点作用 D没有 E 很没作用
10.你认为在英文写作中使用表达观点的词语(如as far as I concerned; I strongly believe that...等)会对英语写作有所帮助吗?
A很有作用 B有作用 C有点作用 D没有 E 很没作用11.在学习课文后,你经常用文章中新学的词组或句型写英语作文?
A 每次都会 B 经常会 C 有时会 D 偶尔会 E 从不会
12.你能够经常把新学到的词组或句型应用到英语作文中?
A 每次都会 B 经常会 C 有时会 D 偶尔会 E 从不会
13.拿到作文题后,你的脑海中能够立即呈现与话题有关的许多词块?
A 每次都会 B 经常会 C 有时会 D 偶尔会 E 从不会
14.你每次在写作中使用的词块都源自于课本所学单词?
A 每次都会 B 经常会 C 有时会 D 偶尔会 E 从不会
15.你经常使用词块记单词吗?
A 每次都会 B 经常会 C 有时会 D 偶尔会 E 从不会
16.在自学的过程中,你是否重视词块的学习与积累?
A很注重 B注重 C有点注重 D不注重 E很不注重
学生访谈内容 (30人)
1.在学习的过程中,你是如何识别词块的?
2.词块是否比单个的单词记得更牢更准?
3.在写作构思的过程中,你如何使用词块来完成英文写作?