| | 224 | |
| | 225 | {{{#!comment |
| | 226 | SUBJECT: Call for Participation: 8th Web as Corpus Workshop (22 July 2013, Lancaster, UK) |
| | 227 | """ |
| | 228 | CALL FOR PARTICIPATION |
| | 229 | |
| | 230 | 8th Web as Corpus Workshop (WAC-8) |
| | 231 | Endorsed by ACL SIGWAC |
| | 232 | Hosted by the Corpus Linguistics 2013 Conference |
| | 233 | |
| | 234 | Monday, 22 July 2013 (Lancaster, UK) |
| | 235 | |
| | 236 | ** Note that registration for the workshop and the main conference closes on SUNDAY, JUNE 30. ** |
| | 237 | Registration URL: http://ucrel.lancs.ac.uk/cl2013/register.php |
| | 238 | |
| | 239 | Further details can be found on the workshop homepage at |
| | 240 | |
| | 241 | http://sigwac.org.uk/wiki/WAC8 |
| | 242 | |
| | 243 | ______________________________________________________________________ |
| | 244 | |
| | 245 | Web corpora and other Web-derived data have become a gold mine for corpus linguistics and natural language processing. The Web is an easy source of unprecedented amounts of linguistic data from a broad range of registers and text types. However, a collection of Web pages is not immediately suitable for exploration in the same way a traditional corpus is. |
| | 246 | |
| | 247 | Since the first Web as Corpus Workshop organised at the Corpus Linguistics 2005 Conference, a highly successful series of yearly Web as Corpus workshops provides a venue for interested researchers to meet, share ideas and discuss the problems and possibilities of compiling and using Web corpora. After a stronger focus on application-oriented natural language processing and Web technology in recent years – with workshops taking place at NAACL-HLT 2010, 2011 and WWW 2012 – the 8th Web as Corpus Workshop returns to its roots in the corpus linguistics community. |
| | 248 | |
| | 249 | Accordingly, the leading theme of this workshop is the application of Web data in language research, including linguistic evaluation of Web-derived corpora as well as strategies and tools for high-quality automatic annotation of Web text. The workshop brings together presentations on all aspects of building, using and evaluating Web corpora, with a particular focus on the following topics: |
| | 250 | |
| | 251 | * applications of Web corpora and other Web-derived data sets for language research |
| | 252 | * automatic linguistic annotation of Web data such as tokenisation, part-of-speech tagging, lemmatisation and semantic tagging (the accuracy of currently available off-the-shelf tools is still unsatisfactory for many types of Web data) |
| | 253 | * critical exploration of the characteristics of Web data from a linguistic perspective and its applicability to language research |
| | 254 | * presentation of Web corpus collection projects or software tools required for some part of this process (crawling, filtering, de-duplication, language identification, indexing, ...) |
| | 255 | |
| | 256 | ______________________________________________________________________ |
| | 257 | |
| | 258 | PROGRAMME |
| | 259 | |
| | 260 | 09:00 Akshay Minocha, Siva Reddy and Adam Kilgarriff -- Feed Corpus: An Ever Growing Up-to-date Corpus |
| | 261 | 09:30 Stephen Wattam, Paul Rayson and Damon Berridge -- LWAC: Longitudinal Web-as-Corpus Sampling |
| | 262 | 10:00 Roland Schäfer, Adrien Barbaresi and Felix Bildhauer -- The Good, the Bad, and the Hazy: Design Decisions in Web Corpus Construction |
| | 263 | 10:30 Jesse Egbert and Douglas Biber -- Developing a User-based Method of Web Register Classification |
| | 264 | |
| | 265 | 11:00 - 11:30 Tea Break |
| | 266 | |
| | 267 | 11:30 Adam Kilgarriff and Vít Suchomel -- Web Spam |
| | 268 | 12:00 David Lutz, Parry Cadwallader and Mats Rooth -- A web application for filtering and annotating web speech data |
| | 269 | 12:30 Sarah Schulz, Verena Lyding and Lionel Nicolas -- STirWaC - Compiling a diverse corpus based on texts from the web for South Tyrolean German |
| | 270 | |
| | 271 | 13:00 - 14:00 Lunch |
| | 272 | |
| | 273 | 14:00 Alexander Piperski, Vladimir Belikov, Nikolay Kopylov, Vladimir Selegey and Serge Sharoff -- Big and diverse is beautiful: A large corpus of Russian to study linguistic variation |
| | 274 | 14:30 Adriano Ferraresi and Silvia Bernardini -- The academic Web-as-Corpus |
| | 275 | 15:00 Silke Scheible and Sabine Schulte Im Walde -- A Compact but Linguistically Detailed Database for German Verb Subcategorisation relying on Dependency Parses from a Web Corpus |
| | 276 | |
| | 277 | 15:30 - 16:00 Tea Break |
| | 278 | |
| | 279 | 16:00 Andrew Brindle -- Thug breaks man's jaw: A Corpus Analysis of Responses to Interpersonal Street Violence |
| | 280 | 16:30 Colleen Crangle -- A web-based model of semantic relatedness and the analysis of electroencephalographic (EEG) data |
| | 281 | 17:00 Discussion and wrap-up |
| | 282 | |
| | 283 | 18:00 Pub |
| | 284 | |
| | 285 | ______________________________________________________________________ |
| | 286 | |
| | 287 | Looking forward to seeing you at the workshop, |
| | 288 | The organising committee. |
| | 289 | |
| | 290 | Stefan Evert, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) |
| | 291 | Egon Stemle, European Academy of Bozen/Bolzano (EURAC) |
| | 292 | Paul Rayson, Lancaster University |
| | 293 | """ |
| | 294 | }}} |