Corpus Linguistics and the Study of Meaning in Discourse

| January 9, 2014

January 2006. Volume 2

Nelya Koteyko



In this paper I discuss contributions that corpus linguistics can make to the study of meaning in discourse. The article takes account of theories and methodologies within structuralism and poststucturalism, which have opened new alleys towards the analysis and interpretation of meanings in linguistics and in a range of related disciplines, in order to provide a theoretical foundation for the corpus linguistic study of meaning in discourse. The focus is on the qualitative analysis of discourse seen as a concrete socio-historical formation characterised by particular ways of using language. In particular, I am interested in the contribution that corpus linguistics can make to the historically-oriented “genealogical” analysis of discourse in the tradition of Foucault. Taking into account theorisations of the concept of discourse in linguistics and social sciences, suggestions are made for underlaying both the synchronic and diachronic aspect of discourse analysis with a principled collection and documentation of data.

Key words: corpus linguistics, discourse, Foucault, meaning.

1. Introduction
Whilst branches of linguistics such as syntax, semantics, and sociolinguistics have as their aim the description of an aspect of language structure or language use, corpus linguistics is a broader concept that can be applied to many aspects of linguistic enquiry. During its early days corpus linguistics was seen merely as a bundle of methods and procedures that deal with empirical data in linguistics. It was predominantly employed to serve lexicography and language teaching. With the formulation of more theoretical principles underlying the corpus approach, we can observe the emergence of corpus linguistics as a (sub-) discipline in its own right. This has lead to a new focus on qualitative analysis together with a concern of discourse in Foucauldian sense, i.e. as a concrete socio-historical formation characterised by particular ways of using language. This article takes up and develops such an approach.

The article takes account of theories and methodologies within structuralism and poststucturalism, which have opened new alleys towards the analysis and interpretation of meaning in linguistics and in a range of related disciplines, in order to provide a theoretical foundation for the corpus linguistic study of meaning in discourse. First, I outline the uses of the term “discourse” in linguistics and social sciences to show different understandings of discourse analysis within the disciplines. The term “discourse” implies a complex interrelationship between the linguistic and the social and different approaches construe this relationship on different terms, as there are several ways to see how meaning is created in language use. Depending on the approach, the understanding of the term “discourse” determines the choice of corpus linguistic principles to supplement discourse analysis. Therefore, further in this article I discuss the concept of discourse and discourse analysis within the theoretical framework of corpus linguistics to demonstrate how corpus linguistics can contribute not only to the analysis of discourse on the level of the quantitative studies of lexis and syntax but also to discourse analysis aimed at the interpretation of lexical items in a particular context (i.e. studies where discourse is theorised as a complex relationship between language, ideology and society).

2. Discourse: The Problem of Definition
Currently, the notion of discourse is employed across a range of disciplines from linguistics to cultural studies and anthropology, and can mean “something as specific as spoken language, or something as general as the social process of communication” (Lemke, 1995, p. 6). The multiplicity of disciplines and approaches that study written or spoken communication makes an attempt to define discourse a difficult task, and it is not my aim here to provide a comprehensive overview of all the approaches. The objective is to present central ideas that influenced the development of the concept of “discourse” in social sciences and linguistics in order to discuss the use of the term in critical discourse analysis and corpus linguistics.

2.1 The Term “Discourse” in Linguistics
In linguistics, discourse has developed as the main object of investigation in two sub-disciplines: conversation analysis and the analysis of written text. Hence at least two definitions of discourse have been elaborated. Here discourse is predominantly seen as 1) language above the sentence level that is extended chunks of text; 2) language in use.

For example, Conversation Analysis (Schegloff and Sacks, 1973; Shegloff, 1998) – a research tradition that grew out of ethnomethodology – studies the social organization of “talk-in-interaction” by a detailed inspection of tape recordings and transcriptions made from such recordings. Therefore, for practitioners of conversation analysis discourse is first of all a naturally occurring conversation, i.e. instances of language in use. It is characterised by the two level view of discourse – the micro level of utterance and the macro level of context of situation.
Compared to previous sentence-dominated models of text analysis, written discourse analysis offers a fundamentally different way of looking at language that proved to be particularly useful in language teaching (Widdowson, 1978). Descriptive discourse approaches to the analysis of written texts are exemplified by the works of Hoey (1994), Winter (1994) and Coulthard (1994). These studies look at texts in terms of their vocabulary, grammar and how these relate to the cohesion and to the realization of the text’s micro and macro structure. Genre Analysis (Swales, 1990) is another discourse analytical approach, where the conventions common to texts of similar type, for example, academic articles, are described.

In general, discourse analysis in Applied Linguistics investigates how lexico-grammatical forms take on meanings in particular contexts, thus seeking to match form and function. Although the emphasis is on how the context affects the use of language (discourse), the proponents of such discourse analysis are not concerned with the ideological implications of language use. This version of discourse analysis does not aim to explore why and how the individuals come to say certain things, as the users of language are seen as more or less autonomous actors who establish meanings by intention and inference. This contrasts with Marxist approaches (critical discourse analysis discussed further in this paper) that operate with a broader understanding of context and thus with a politicized view of discourse, where subject is “interpellated” by discourse or ideology (Althusser, 1971).

2.2 The Term “Discourse” in Social Sciences
In the late 1960-s significant shifts occurred in the conceptualisation of how meanings are constructed through the social use of language. The models developed as the result of this shift have the notion of discourse as their central category. Their common feature is the definition of discourse as a form of social practice. The new angle on the view of discourse challenged the structuralist concept of “language” as an abstract system (Saussure’s langue) and emphasized the process of making and using meanings within particular historical, social, and political conditions. At this level, then, the term discourse is employed to explain the conditions of language use within the social relations that structure them.

2.1.1 Foucault’s archaeology and genealogy.
Foucault’s approach to discourse is central to many works in social sciences. Despite the wide influence of his works on the conceptualisation of discourse, Foucault does not provide a consistent and clear definition of this term. Foucault (1989) himself acknowledges the wide range of meanings that the term “discourse” has in his works:

Instead of gradually reducing the rather fluctuating meaning of the word ‘discourse’ I believe I have in fact added to its meanings: treating it as sometimes the general domain of all statements, sometimes as an individualisable group of statements, and sometimes as a regulated practice that accounts for a number of statements. (p. 80)

The main points in Foucault’s discussion of discourse in his “Archaeology of Knowledge” are as follows: 1) the smallest unit of discourse is a statement; 2) discourse is the body of formulated statements and represents the archive of the discourse analyst; 3) regularity in dispersion of statements is called discursive formation: “whenever one can describe, between a number of statements, such a system of dispersion, whenever, between objects, types of statements, concepts, or thematic choices, one can define a regularity, we will say, for the sake of convenience, that we are dealing with discursive formation” (Foucault, 1989, p.38).

For Foucault then, discourse does not consist of texts but statements. Texts or books do not have strict boundaries to provide the basis for discourse analysis. A statement subscribes to certain concepts and is a statement only in the surrounding of formulations that it implicitly or explicitly refers to, by the way of modifying them, repeating them, or opposing them. According to Foucault, statements always invoke other statements in one way or another: statements do not only relate to previous statements but also contain some features of the future ones.

In contrast to literary analysis, Foucault’s discourse analysis does not see a book or a text as embodiment of the writer’s thoughts, experiences, or unconscious. It does not strive to interpret texts in order to make them complete. Therefore, any analysis of a statement should not go back to the author and his intentions or circumstances. According to Foucault (1989), a statement should be analysed as it appears in discourse, and cannot be reduced to expressing anything external, such as for example, an underlying intention, or context:

Archaeology tries to define not the thoughts, representations, images, themes, pre-occupations that are concealed or revealed in discourses; but those discourses themselves, those discourses as practices obeying certain rules…It is not an interpretative discipline: it does not seek another, better-hidden discourse. (p.138)

A study of discourse based on the Foucault’s views would thus be concerned with “the rules (practices, technologies) which make a certain statement possible to occur and others not at particular times, places and institutional locations” (Foucault, 1989, p.21). This is of importance for the historical analysis of meaning of lexical items: each time period is characterized by its own means of knowledge production. Such kind of analysis aims to clarify why a particular knowledge is articulated in the specified time period, and how it finds reflection in the meaning of lexical items used in this period.

Foucault’s archaeology is predominantly a synchronous analysis of statements in discourse seeking to uncover complexities of texts and how each discourse delimits its own boundaries. At the later stage of his work, Foucault turns to the problems of power and develops a historically oriented “genealogical” line of analysis drawing on Nietzsche’s “Genealogy of Morals” (Foucault, 1984). Here we see Foucault using the term “discourse” to refer to any written or spoken language in which power is exercised: “As history constantly teaches us, discourse is not simply that which translates struggles or systems of domination, but is the thing for which and by which there is struggle” (Foucault, 1981, p. 372). Therefore, power is always present regardless of the approach to things chosen, as long as the objects we relate to are objects produced by discourse.

Foucault represents the European (mainly French) tradition of discourse analysis, which embraces the “socio-historical-political” view of discourse. This approach tends to theorise discourse from the very beginning as “socio-historically specific systems of knowledge and thought” (McHoul and Luke, 1989, p. 324). According to Foucault then, discourse is inseparable from ideology although he avoids the use of the term itself. Meaning, as studied in discourse, is always ideological. This contrasts with the discourse analysis carried out according to Anglo-American tradition (Critical Discourse Analysis discussed below), where the analysis is carried on within a dualistic framework of the linguistic analysis and an added-on political dimension (McHoul and Luke, 1989).

As Foucault’s view of discourse does not include the concept of ideology there still remains the difficulty of explaining the ways in which oppositional political ideologies are constituted and function (Laclau and Mouffe, 1985, pp. 134-45). In this regard, Howarth (2002), for example, suggests supplementing Foucault’s genealogical account of political discourse with a post-Marxist concept of hegemonic practice that enables one to explain the formation of oppositional ideologies. In a similar way, Michel Pêcheux successfully incorporates the concept of oppositional ideologies into the theory of language and discourse. His work is discussed in the next section.

2.1.2 “Discourse” in the works of Michel Pêcheux.
The French discourse theorist Michel Pêcheux works in the space between the “subject of language” and the “subject of ideology”. His work is characterised by a pronounced focus on establishing the connections between the linguistic theory and the theory of discourse and provides insights into the conditions for an oppositional politics of the production of meaning.

In his “Language, Semantics and Ideology”, Pêcheux sees discourse as an intermediate link between language and ideology (language here is seen as the object of linguistics, i.e. the Sausserian langue) as he attempts to clarify the links between the “obviousness of meaning” and “the obviousness of the subject” (1982, p.55). For Pêcheux, discursivity is not indifferent to ideological struggles, because “every discursive process is inscribed into an ideological class relationship” (1982, p. 59).

In the traditional view of lexicon, a lexeme is seen as the smallest carrier of meaning, and words as having their own meaning. In contrast to this position, Pêcheux maintains that words do not have their own “word” meanings. As he writes, “a word, expression or proposition does not have a meaning of its own, a meaning attached to its literality.” Pêcheux emphasizes that meaning, “does not exist anywhere except in the metaphorical relationships (realized in substitution effects, paraphrases, synonym formations) which happen to be more or less provisionally located in a given discursive formation: words, expressions, and propositions get their meanings from the discursive formation to which they belong” (Pêcheux, 1982, p. 188).

Lexemes are thus seen as having a discursive meaning identifiable through their interrelations with other lexemes. As in Foucault’s view of discourse, meaning is seen as dependant on a complex system of statements and is thus influenced by the discursive practice. Following Pêcheux’s analytical insights, the study of paraphrases, those “metaphorical relationships”, as he calls them, in which meaning is located within the boundaries of a discursive formation can lead to insights into the ideological dimension of meaning.

According to one of Pêcheux’s main theses “words, expressions, propositions, etc., change their meaning according to the [ideological] positions held by those who use them, which signifies that they find their meaning by reference to those positions; that is, by reference to the ideological formations in which those positions are inscribed” (1982, p.111). Here Pêcheux suggests that the naturalness or obviousness of words or expressions (after Althusser, 1971) leads to changes in their meaning as they “slide” or “slip” from one discursive formation to another. Therefore, new meanings of lexical items arise from interdiscursive relations and are the result of the struggle for power.

2.3 The Term “Discourse” in the Works of Critical Discourse Analysts
So far, a clear distinction was maintained between the use of the term “discourse” in Linguistics and Critical Theory – a branch of scholarship elaborated by a group of scholars from the Frankfurt school that deals with the development of emancipatory knowledge – knowledge that can empower otherwise powerless groups and lead to the creation of a society free from domination of anyone’s interests. However, there are examples of the definition of the term that draw on both disciplines. Thus, proponents of Critical Discourse Analysis (CDA) fuse the linguistic and critical theory definitions of the term and focus on “not just describing discursive practices, but also showing how discourse is shaped by relations of power and ideologies, and the constructive effects discourse has upon social identities, neither of which is normally apparent to discourse participants” (Fairclough, 1992, p.12).

This direction in discourse analysis is therefore highly politicized as it is devised to bring out hidden meanings and implicit assumptions that would otherwise escape critical attention. Drawing on the works of a number of influential discourse theorists (including the above mentioned works of Foucault and Pecheux) CDA aims to help the analyst understand social problems that are mediated by mainstream ideology and power relationships. Discourse is seen as both socially constituted and socially constitutive as it produces objects of knowledge, social identities and relationships between people (Fairclough, 1995).

For example, Fairclough’s works, “Language and Power” (1989) and “Critical Discourse Analysis” (1995), articulate a three-dimensional framework for studying discourse, “where the aim is to map three separate forms of analysis onto one another: analysis of (spoken or written) language texts, analysis of discourse practice (processes of text production, distribution and consumption) and analysis of discursive events as instances of sociocultural practice” (1995, p. 2).

An important method for analysing discursive practice for Fairclough is through the concept of “intertextuality” which (following Bakhtin, 1981) refers to the way texts derive their meaning from other texts (Fairclough, 1992). Fairclough approaches intertextuality on the macro level of narratives, genres, and discourses. The intertextual analysis aims to show how media texts are constituted through often hybrid configurations of different genres and discourses, which in turn constitute the larger “orders of discourse”. In this sense, the analysis of discourse practise relates textual analysis to the analysis of sociocultural practise.

A number of criticisms have been raised about CDA’s methods of data collection and description. There is no typical way of collecting data in CDA. Some authors do not even mention data collection methods and others rely strongly on traditions based outside sociolinguistic field (cf. Titscher et al., 2000). There is little discussion about statistical and theoretical representativeness of the material analysed (Stubbs, 1997). Many CDA studies deal with only small corpora which are usually regarded as being typical of certain discourses. Hence the criticism is about untheorised choice and use of fragmentary textual material, which make replication and comparison of different studies difficult to achieve.

For example, Fairclough’s approach to critical discourse analysis is designed for the analysis of a relatively small number of texts. A very detailed linguistic analysis suggested by his framework would be impossible to carry out on a large collection of texts. Therefore, Fairclough uses carefully selected texts only to exemplify the main categories of his approach. This emphasis on micro-linguistic analysis makes it difficult to transfer the results to the macro level of social theory.

The procedure of CDA is defined by as interpretative process (Meyer 2001, p.16), although the stance of performing interpretative (hermeneutic) analysis is not made explicit by all critical discourse analysts. The interpretative procedure, as a method of identifying and summarising meaning relations, presupposes that a substantial amount of data is analysed, because the process of interpretation is based upon the identification of the links a text or a text segment has with other texts. Consequently, a detailed documentation of the data used in the investigation is also necessary. In contrast to the outlined procedure, CDA adopts rather “text-reducing” method of analysis as it concentrates on clear formal properties of a small number of texts, which contradicts their “hermeneutic endeavour” (Meyer 2001, p. 16).

Another point of criticism is the lack of diachronic studies in CDA. CDA analytical frameworks, with the exception of Wodak’s historical method (Weiss and Wodak, 2003) tend to be ethnographic, and thus synchronic. Critical discourse analysts identify changes in meanings of lexical items when they talk about their connotations signalling ideological bias, but because of their limited use of textual sources, they do not document these changes diachronically or quantitatively. However, any attempt at understanding the impact of temporal context needs to include a diachronic perspective and in one of the following sections of this paper I will discuss how corpus linguistics can be employed for diachronic investigations of meaning in discourse.

3. Discourse and Discourse Analysis in Corpus Linguistics
In her criticism of sociolinguistics, Hasan (2004) emphasizes the importance of data driven research within the field that investigates the interrelations between the linguistic and the social. Only when the sociolinguistics allows “data to speak to it”, it becomes obvious that language has to be viewed as meaning potential. Below I introduce the view of discourse in corpus linguistics which, being a strongly data driven approach, can not only be complementary for conducting discourse analysis in Applied Linguistics and CDA but also can serve as a theoretical framework for the historically oriented “genealogical” analysis in Foucauldian sense.

3.1 The Term “Discourse” in Corpus Linguistics
Corpus research started out as a methodological approach based on collecting and documenting real-life language data. The field was established in 1967, when Henry Kucera and Nelson Francis published their classic work “Computational Analysis of Present-Day American English” on the basis of the Brown Corpus. Corpus linguists emphasize the importance of studying patterns of real language use in linguistic research. They advocate an analysis of language based on large collections of authentic texts – corpora. Corpora are used to derive empirical knowledge about language, which can supplement information from reference sources and introspection.

For corpus linguistics then, discourse is a totality of texts produced by a community of language users who identify themselves as members of a social group on the basis of the commonality of their world views (Teubert, 2005). Their shared attitudes and beliefs find reflection in the way members of such a community use language – which topics they highlight in their conversations, which expressions recur in their day-to-day interaction etc. (the view of discourse community common in Cultural Studies). Such a discourse is what Foucault (1989, p. 80), cited above, refers to as “individualisable group of statements”- a group of statements which seem to exemplify a similar sets of concerns and which have some coherence, e.g. “discourse of organic food promotion” or “discourse of British left wing press”.

As corpus linguists have nothing but texts at their disposal they have to be content with the version of reality supplied by discourse members who produce the texts. In contrast to the tradition of Sausserian structuralism, as well as some approaches in CDA (for criticism see Pennycook, 1994), which assume that there is an “underlying” pre-given reality beyond signs, the reality obscured by “misinterpretation”, the corpus-driven approach views discourse as a self-referential system (Teubert, 2005) and meaning as an entirely discourse internal phenomenon

According to such a view, discourse has a reality of its own, constructed of past and current texts, and is thus constitutive of its objects. Facts can be found only within discourse. This is not to say that discourse external “real world” facts do not exist, but that they are not knowable to us, cannot be communicated and thus do not have any meaning outside discourse. Consequently, texts never show us what “really” happened, only “the narrative of what happened” with a point of view and cultural/ideological interests (Irvine, 1994).

The social perspective on meaning advocated by corpus linguists implies that meaning is seen not as personal or cognitive but as cultural and shared. This is characteristic of the social constructionist approach, according to which “…it is within social interaction that language is generated, sustained, and abandoned. . . The emphasis is thus not on the individual mind but on the meanings generated by people as they collectively generate descriptions and explanations in language” (Gergen and Gergen, 1991, p. 78).

Thus, the corpus linguistic approach is compatible with the Foucauldian analysis of discourse though the latter is not linguistic in the traditional sense. As discussed above, for traditional linguistics, discourse is language in use, a communicative exchange, not a complex entity that extends into the realms of ideology, strategy, language and practice, and is shaped by the relations between power and knowledge, as it is for Foucault and, currently, for the proponents of CDA. Nevertheless, there are common points which allow merging linguistic and “archaeological” methods of research in the corpus-driven approach to the study of discourse: 1) the view of language as a social construct 2) the emphasis on historical and cultural aspects of meaning production in discourse. From this perspective, the corpus-driven approach to discourse would be focused not on how meanings are constructed between sentences, which is characteristic of the abovementioned approach to discourse analysis in Applied Linguistics, but rather on how meanings come to be articulated at particular moments in history.

3.2 Corpus Linguistics and Quantitative Methods of Discourse Analysis
During its first years, corpus research was widely used to complement methodologies in the studies of linguistic variation. Its quantitative methods were new and quickly became popular in various branches of language analysis. Corpus linguistics can, and indeed has been used to supplement both the discourse analysis in Applied Linguistics – (the “non-critical” discourse analysis employed in language teaching) and Critical Discourse Analysis aimed at revealing ideological biases on the basis of the synchronic studies of lexical patterns (Orpin, 2005 among the most recent).

Thus, the predominantly synchronic corpus-driven approach following the British traditions of text analysis proclaims a close link between co-text and context . It is assumed that the choice of words in a text reflects social choices, and it is in this way that the selection at the textual level is seen as reflecting the contextual level dealing with social and cultural aspects. This link between co-text and context is important for the study of language of a particular discourse, and also enables comparison between discourses, as the same words and expressions within the same language can have different semantic values for people from different discourse communities. By comparing the ways that discourse communities use language on the basis of corpora specifically tailored for that purpose, particularly in respect to the lexical choices they make, a corpus linguist has a good picture of what it is that makes their language ideological.

The computer software allows systematic analysis of discursive patterns without recourse to the authors and their intentions. Therefore, a segment of discourse in the form of a corpus is analysed as a collective body of statements and not as a collection of opinions of individual authors. Details about the authors’ social background are seen as irrelevant for the purpose of the investigation. Concordance or collocation software picks out only recurrent patterns, thus producing empirical evidence for how the object of discourse is formed. Not employing the notion of authority and authorship, we automatically discard any interest in what lies hidden; the analysis is concerned only with what is on the “surface” of the texts.

For example, Stubbs (1996) provides an example of such a methodological framework. Subscribing to the views on discourse of Foucault and Firth he examines culturally important keywords and fixed phrases, “the kinds of things that are repeatedly said, in discourse which is jointly constructed, but which is known consciously by no one” with corpus linguistic methods (1996, p. 194). Such an approach is corpus-driven – it constitutes a methodology that uses a corpus beyond the selection of examples to support linguistic arguments or to validate a theoretical statement (Tognini-Bonelli, 2001). The theoretical statements, as well as comments or recommendations made, arise directly from, and reflect, the evidence provided by the corpus.

3.3 Corpus Linguistics and Qualitative Analysis of Discourse
With the formulation of more theoretical principles underlying the corpus approach (Thomas and Short, 1996; Teubert, 2005), we can observe a new focus on qualitative analysis. In the course of linguistic analysis the following question is posed: what are the rules governing the production of a particular statement, and other statements related to it? During the investigation of meaning in a particular discourse the focus is on the different issue, namely: why is this particular statement and not the other? Here there is a shift of emphasis from description to explanation characterised by the objective to analyse the specific meaning construed in discourse within particular spatial and temporal frames. This is the view of discourse analysis found in Foucault’s works, whose interest was in “seeing historically how effects of truth are produced within discourses which in themselves are neither true nor false” (Foucault, 1980, p.118).

If meaning is to be searched for in discourse itself, as suggested by Foucault, then we need a much larger “archive” of texts to look for reactions and paraphrases (i.e. all additions and changes to meaning) than is normally compiled by critical discourse analysts. The archive of statements in the machine readable form would then allow studying the discursive emergence of meaning on the level of language . Corpus linguistics – being a (sub-) discipline that deals with large bodies of authentic language data can provide such a possibility. However, the large quantity of real language data is only one essential component of the “socio-historically” oriented study of discourse. The principled choice of texts and their arrangement represent another important constituent.

The principled collection of texts in the form of a corpus has to have a number of specific characteristics, which would make a corpus not merely a tool but a concept in discourse analysis. According to Busse et al. (1994, p.14) texts that make up a corpus representing a segment of discourse have the following features: they deal with a particular theme, object, knowledge complex or concept; they are interconnected in accordance with the specific purpose of the communication; they are defined by specific parameters such as time period, area, segment of society or text type; and, essentially, they are characterised by implicit or explicit textual or semantic (contextual) connections which makes a corpus an intertextual entity. It seems to me that another essential characteristic of such a corpus is the chronological arrangement and full documentation of texts. Only then we can be sure that such a corpus will enable the analyst to investigate into the socio-historical aspects of meaning production in discourse.

The principled collection of data can contribute to the problem of the replicability of the analysis, which is an important issue in the qualitative studies of discourse. The strict “objectivity” can never be achieved by means of discourse analysis as it inevitably embeds beliefs and ideologies of the analysts. However, it is possible to make the analysis replicable through detailing the analytical steps taken and making the data available. Corpora as well as the software used in the investigation are normally accessible to anyone who wishes to carry out his/her own investigation.

Considering the predominance of (internal) content criteria (such as common topic and intertextual links) over external objective parameters (e.g. date) in the corpus make up the descriptive frames of the corpus-driven approach differ from traditional linguistic approaches. Discourse in such an approach is less defined by objective parameters of time and space, but rather intentionally defined by its content. Therefore, whereas descriptive linguistics takes its analytical categories externally out of the formal relationships between linguistic entities in texts under investigation (for example, from a collection of texts that belong to some time period) and then makes explicit formal connections (what makes us say that these texts belong to standard English or a variety of English), the approach to discourse in a “socio-historically” oriented corpus-linguistic study would be focused on the internal content-driven connections between texts, i.e. what makes these texts belong to the discourse of, for example, British left wing press.

The analysis of paraphrases (seen as “metalinguistic statements” that serve for explanation, explication or re-definition (Teubert, 2005)) within a corpus that represents a segment of discourse can complement the hermeneutic endeavours of discourse analysts interested in how utterances came to be made and how their production was constrained. As pointed out by Pecheux (1982) paraphrases play a crucial role in the process of meaning construction in discourse. In a corpus that represents a segment of discourse they indicate various links that connect text segments, and are easily identifiable with the help of the search tools available through most corpus linguistic software . The study of paraphrases in such a corpus thus allows a detailed and documented diachronic analysis of intertextual links that uniquely characterise any text segment in the focus of analysis.

From the point of French discourse analysis meaning is always a result of the hegemonic struggle. The automatic searches for paraphrase of the lexical item can also help tracing various and often contrasting definitions given to a word that circulates in general discourse and in this way revealing possible sites of conflict. From this perspective, the interpretation of meaning in terms of paraphrases supplied by members of contending discourse communities will bring us closer to historically situated discourse analysis, as well as enable the documented analysis of meaning change.

4. Conclusions
The discussion of theoretical points undertaken in this article was intended to demonstrate how corpus linguistics can be a useful framework for the study of meaning in discourse. Although corpus linguistics has a lot to offer for the synchronic investigation of meaning in terms of frequency information for words, phrases, or constructions used in discourse, a particular focus of this paper has been to discuss how it can also be an important framework for the study of the unstable and disputed nature of meaning.

The theoretical framework for reading and interpreting texts and text segments in their interdiscursive conditions of emergence established on the basis of the works of Foucault and Pecheux enables the analyst to study meanings of lexical items through their paraphrases in different discourses. In order to be able to analyse discourse in Foucauldian sense from the linguistic perspective we need to have recourse to texts produced within the community of speakers, delimited according to our research purposes. In this paper I discussed how corpora compiled according to a set of pre-determined criteria (such as, for example, content-driven connections between texts and chronological arrangement of texts) offer the means to study the emergence of meanings of lexical items within discourse communities in question. It is in this way that the principled collection and detailed documentation of texts in corpus linguistics allows gaining empirical evidence for assertions made in the CDA context and complements its qualitative analysis by strengthening the interpretative basis.

