|  | 1 | = 6th Web as Corpus Workshop (WAC-6) = | 
          
            |  | 2 | To be held in association with [http://naaclhlt2010.isi.edu/ NAACL-HLT] in Los Angeles, | 
          
            |  | 3 | 5th/6th June 2010 | 
          
            |  | 4 |  | 
          
            |  | 5 | Sponsored by [http://www.sigwac.org.uk ACL SIGWAC] | 
          
            |  | 6 |  | 
          
            |  | 7 | === Invited Speaker: [http://www.patrickpantel.com/ Patrick Pantel], ISI, University of Southern California === | 
          
            |  | 8 |  | 
          
            |  | 9 |  | 
          
            |  | 10 | More and more people are using Web data for linguistic and NLP research.  The workshop, the sixth in an annual series, provides a venue for exploring how we can use it effectively and what we will find if we do. | 
          
            |  | 11 |  | 
          
            |  | 12 | We invite submissions which: | 
          
            |  | 13 | *      describe Web corpus collection projects, or modules for one part of the process (crawling, filtering, de-duplication, language-id, tokenising, indexing, ...) | 
          
            |  | 14 | *      explore characteristics of Web data from a linguistics/NLP perspective including registers, domains, frequency distributions, comparisons between datasets | 
          
            |  | 15 | *      use crawled Web data for NLP purposes (with emphasis on the data rather than the use) | 
          
            |  | 16 | Previous WAC workshops have been in Europe and Africa. The west coast of the US is the global centre for web development, hosting Google, Microsoft, Yahoo and a thousand others, so we are looking forward to visiting! | 
          
            |  | 17 |  | 
          
            |  | 18 |  | 
          
            |  | 19 | == Call for Papers == | 
          
            |  | 20 | * Submission by '''March 1st 2010,''' to be made through the NAACL system at  https://www.softconf.com/naaclhlt2010/webascorpus/ | 
          
            |  | 21 | * Notification of acceptance by March 30 | 
          
            |  | 22 | * Camera-ready copy due April 12 | 
          
            |  | 23 |  | 
          
            |  | 24 | Submissions should be formatted using the NAACL 2010 stylefiles, with blind review and not exceeding 8 pages plus an extra page for references. The stylefiles are available at http://naaclhlt2010.isi.edu/authors.html.  Each submission will be reviewed by at least two members of the programme committee. Accepted papers will be published in the workshop proceedings. | 
          
            |  | 25 |  | 
          
            |  | 26 |  | 
          
            |  | 27 | == Organising committee == | 
          
            |  | 28 | * Adam Kilgarriff (Lexical Computing Ltd., Workshop Chair) | 
          
            |  | 29 | * Dekang Lin (Google Inc) | 
          
            |  | 30 | * Serge Sharoff (University of Leeds, SIGWAC Chair) | 
          
            |  | 31 |  | 
          
            |  | 32 | == Programme committee == | 
          
            |  | 33 | Organising committee plus: | 
          
            |  | 34 | * Silvia Bernardini, U of Bologna, Italy | 
          
            |  | 35 | * Stefan Evert, U of Osnabrück, Germany | 
          
            |  | 36 | * Cédrick Fairon, UCLouvain, Belgium | 
          
            |  | 37 | * William H. Fletcher, U.S. Naval Academy, USA | 
          
            |  | 38 | * Gregory Grefenstette, Exalead, France | 
          
            |  | 39 | * Igor Leturia, Elhuyar Fundazioa, Basque Country, Spain | 
          
            |  | 40 | * Jan Pomikalek. Masaryk Univ, Czech Republic | 
          
            |  | 41 | * Preslav Nakov, National U of Singapore | 
          
            |  | 42 | * Kevin Scannell, Saint Louis U, USA | 
          
            |  | 43 | * Gilles-Maurice de Schryver, U Gent, Belgium |