Changes between Version 24 and Version 25 of WAC-XI


Ignore:
Timestamp:
02/16/17 18:03:28 (7 years ago)
Author:
Roland Schäfer
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WAC-XI

    v24 v25  
    2424For almost a decade, the ACL SIGWAC, and  most notably the Web as Corpus (WAC) workshops, have served as a platform for researchers interested in the com­pilation, processing and use of web-derived corpora as well as computer-mediated communication. Past workshops were co-located with major conferences on corpus linguistics and/or computational linguis­tics (such as ACL, EACL, Corpus Linguistics, LREC, NAACL, WWW). The eleventh Web as Corpus workshop (WAC-XI) emphasises the linguistic aspects of web corpus research more than the technological aspects while keeping in mind that the two are inseparable.
    2525
    26 The World Wide Web has become increasingly popular as a source of linguistic evidence, not only within the computational linguistics community, but also with theoretical linguists facing problems such as data sparseness or the lack of variation in traditional corpora of written language. Accordingly, web corpora continue to gain relevance, given their size and diversity in terms of genres and text types. In lexicography, web data have become a major and well-established resource with dedicated research data and an environment such as the !SketchEngine. In other areas of linguistics, the adoption rate of web corpora has been slower but steady. Furthermore, some areas of research dealing exclusively with web (or similar) data have emerged, such as the con­struction and exploitation of corpora based on short messages. Another example is the (manual or auto­matic) classification of web texts by genre, register, or – more generally speaking – text type, as well as topic area. Similarly, the areas of corpus evaluation and corpus comparison have been advanced greatly through the rise of web cor­pora, mostly because web cor­pora (especially larger ones in the region of several billions of tokens) are often created by download­ing texts from the web unselectively with respect to their text type or content. While the composition (or strati­fication) of such corpora cannot be determined before their construction, it is desirable to evaluate it afterwards, at least. Also, comparing web corpora to corpora that have been compiled in a traditional way is key in determining the quality of web corpora with respect to a given research question.
     26The World Wide Web has become increasingly popular as a source of linguistic evidence, not only within the computational linguistics community, but also with theoretical linguists facing problems such as data sparseness or the lack of variation in traditional corpora of written language. Accordingly, web corpora continue to gain relevance, given their size and diversity in terms of genres and text types. In lexicography, web data have become a major and well-established resource with dedicated research data and specialised tools such as the !SketchEngine. In other areas of linguistics, the adoption rate of web corpora has been slower but steady. Furthermore, some completely new areas of research dealing exclusively with web (or similar) data have emerged, such as the con­struction and exploitation of corpora based on short messages. Another example is the (manual or auto­matic) classification of web texts by genre, register, or – more generally speaking – text type, as well as topic area. Similarly, the areas of corpus evaluation and corpus comparison have been advanced greatly through the rise of web cor­pora, mostly because web cor­pora (especially larger ones in the region of several billions of tokens) are often created by download­ing texts from the web unselectively with respect to their text type or content. While the composition (or strati­fication) of such corpora cannot be determined before their construction, it is desirable to evaluate it afterwards, at least. Also, comparing web corpora to corpora that have been compiled in a more traditional way is key in determining the quality of web corpora with respect to a given research question.
    2727
    2828=== Call for papers === #cfp
    2929
    30 The eleventh Web as Corpus workshop (WAC-XI) takes a (corpus) linguistic look at the state of the art in all these areas. More specifically, in linguistic publications presenting case studies based on web data, some authors explicitly discuss and/or defend the validity of web corpus data for a specific type of research question – while others simply take web corpora as a new or complementary source of data without discussing fundamental questions of data quality and appropriateness of web data for specific research questions. We think it is vital to discuss such fundamental questions, and therefore ask researchers to present and discuss
     30The eleventh Web as Corpus workshop (WAC-XI) takes a (corpus) linguistic look at the state of the art in all these areas. More specifically, in linguistic publications presenting case studies based on web data, some authors explicitly discuss and/or defend the validity of web corpus data for a specific type of research question – while others simply take web corpora as a new or complementary source of data without discussing fundamental questions of data quality and appropriateness of web data for a given research question. We think it is vital to discuss such fundamental questions, and therefore ask researchers to present and discuss
    3131
    3232* case studies in corpus or computational linguistics where web data have been used