Changes between Version 10 and Version 11 of WAC-X


Ignore:
Timestamp:
01/24/16 15:29:39 (8 years ago)
Author:
Roland Schäfer
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WAC-X

    v10 v11  
    11= 10th Web as Corpus Workshop (WAC-X) =
    22
    3 '''featuring the EmpiriST Shared Task'''[[BR]]
    4 August 12, 2016, Berlin / co-located with [http://acl2016.org/ ACL 2016][[BR]]
    53Endorsed by the Special Interest Group of the ACL on Web as Corpus (SIGWAC)
    64
    7 '''[#cfp 1st Call for Papers is out!]'''
     5Co-located with [http://acl2016.org/ ACL 2016][[BR]]
     6August 12, 2016, Berlin[[BR]]
     7
     8'''[#cfp The Call for Papers is out!]'''
    89
    910== WAC-X main workshop ==
     11The World Wide Web has become increasingly popular as a source of linguistic data, not only within the NLP communities, but also with theoretical linguists facing problems of data sparseness or data diversity. Accordingly, web corpora continue to gain importance, given their size and diversity in terms of genres/text types. The field is still new, though, and a number of issues in web corpus construction need much additional research, both fundamental and applied. These issues range from questions of corpus design (e.g., assessment of corpus composition, sampling strategies and their relation to crawling algorithms, and handling of duplicated material) to more technical aspects (e.g., efficient implementation of individual post-processing steps in document cleaning and linguistic annotation, or large-scale parallelization to achieve web-scale corpus construction). Similarly, the systematic evaluation of web corpora, for example in the form of task-based comparisons to traditional corpora, has only recently shifted into focus. For almost a decade, the ACL SIGWAC (http://www.sigwac.org.uk/), and especially the highly successful Web as Corpus (WAC) workshops have served as a platform for researchers interested in compilation, processing and application of web-derived corpora. Past workshops were co-located with major conferences on computational linguistics and/or corpus linguistics (such as EACL, NAACL, LREC, WWW, and Corpus Linguistics).
    1012
    11 The World Wide Web has become increasingly popular as a source of linguistic data, not only within the NLP communities, but also with theoretical linguists facing problems of data sparseness or data di­versity. Accordingly, web corpora continue to gain importance, given their size and diversity in terms of genres/text types. The field is still new, though, and a number of issues in web corpus construction need much additional research, both fundamental and applied. These issues range from questions of corpus design (e.g., corpus composition assessment, sampling strategies and their relation to crawling algorithms, handling of duplicated material) to more technical aspects (e.g., efficient implementation of individual post-processing steps in document cleansing and linguistic annotation, or large-scale paral­lelization to achieve web-scale corpus construction). Similarly, the systematic evaluation of web cor­pora, for example in the form of task-based comparisons to traditional corpora, has only recently shifted into focus. For almost a decade, the ACL SIGWAC (http://www.sigwac.org.uk/), and especially the highly suc­cessful Web as Corpus (WAC) workshops have served as a platform for researchers interested in com­pilation, processing and application of web-derived corpora. Past workshops were co-located with ma­jor conferences on computational linguistics and/or corpus linguistics (such as EACL, NAACL, LREC, WWW, Corpus Linguistics).
    12 
    13 See below for information regarding the co-located [#empirist EmpiriST shared task] and the [#paneldisc panel discussion on "Corpora, open science, and copyright reforms"].
     13WAC-X will also feature the final workshop of the EmpiriST 2015 shared task "Automatic Linguistic Annotation of Computer-Mediated Communication / Social Media" (see https://sites.google.com/site/empirist2015/ for details) and the panel discussion "Corpora, open science, and copyright reforms" (see https://www.sigwac.org.uk/wiki/WAC-X#paneldisc for details).
    1414
    1515== Organizers ==
     
    2020* [http://iiegn.eu/work Egon Stemle (European Academy of Bozen/Bolzano)]
    2121
    22 == 1st Call for Papers == #cfp
     22=== Important dates ===
     23
     24* 8 May 2016: Workshop Paper Due date (23:59 GMT-12)
     25* 5 June 2016: Notification of Acceptance
     26* 22 June 2016: Camera-ready papers due
     27* 12 August 2016: Workshop Date
     28
     29== Call for Papers == #cfp
    2330
    2431As in previous years, the 10th Web as Corpus workshop (WAC-X) invites contributions pertaining to all aspects of web corpus creation, including but not restricted to
     
    3340Furthermore, aspects of usability and availability of web-derived corpora are highly relevant in the context of WAC-X
    3441
    35 * development of interfaces
     42* development of corpus interfaces
    3643* visualization techniques
    3744* tools for statistical analysis of very large (e.g., web-derived) corpora
     
    4148
    4249Finally, reports of the use of web corpora in language technology and linguistics are welcome, for example
    43 information extraction & opinion mining
    4450
     51* information extraction & opinion mining
    4552* language modeling, distributional semantics
    4653* machine translation
     
    5562=== Submission format ===
    5663
    57 All submissions must be in PDF format and should follow the ACL 2015 style guidelines. We strongly recommend the use of the ACL 2015 LaTeX style files or Microsoft Word Style files. We reserve the right to reject submissions that do not conform to these styles including font and page size restrictions.
    58 
    59 * [http://acl2015.org/files/acl2015.pdf General instructions (PDF)]
    60 * LaTeX: [http://acl2015.org/files/acl.bst BST], [http://acl2015.org/files/acl2015.sty STY], [http://acl2015.org/files/acl2015.tex TEX]
    61 * MS Word: [http://acl2015.org/files/acl2015.dot DOT]
     64All submissions must be in PDF format and should follow the ACL 2016 style guidelines. We strongly recommend the use of the ACL 2016 LaTeX style files or Microsoft Word Style files. The style files and example documents will be available from the workshop website or directly from http://acl2016.org. We reserve the right to reject submissions that do not conform to these styles including font and page size restrictions.
    6265
    6366Full paper submissions may consist of up to eight (8) pages of content plus any number of pages consisting of only references. Short papers may consist of up to four (4) pages of content plus any number of pages consisting of only references. Full papers will be distinguished from short papers in the proceedings.
     
    6568Papers will be presented either orally or as posters at the workshop. There will be no distinction between papers presented orally and those presented as posters in the proceedings.
    6669
    67 Reviewing of papers will be double-blind. Therefore, the paper must not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ...", must be avoided. Instead, use citations such as "Smith (1991) previously showed ...". Papers not conforming to these requirements will be rejected without review.
    68 
    69 === Important dates ===
    70 
    71 * 8 May 2016: Workshop Paper Due date (23:59 GMT-12)
    72 * 5 June 2016: Notification of Acceptance
    73 * 22 June 2016: Camera-ready papers due
    74 * 12 August 2016: Workshop Date
     70Reviewing of papers will be double-blind. Therefore, the paper must not include the author's names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ...", must be avoided. Instead, use citations such as "Smith (1991) previously showed ...". Papers not conforming to these requirements will be rejected without review.
    7571
    7672