1 | \newcommand{\thetitle}{Proceedings of the 8th Web as Corpus Workshop (WAC-8)
|
---|
2 | @Corpus Linguistics 2013}
|
---|
3 | \newcommand{\authora}{Stefan Evert}
|
---|
4 | \newcommand{\authorb}{Egon Stemle}
|
---|
5 | \newcommand{\authorc}{Paul Rayson}
|
---|
6 | \newcommand{\theauthors}{\authora, \authorb, \authorc}
|
---|
7 | % init geometry with these values to have them when fancyhdr loads
|
---|
8 | \PassOptionsToPackage{%
|
---|
9 | twoside=false,
|
---|
10 | top=1cm,
|
---|
11 | bottom=1cm,
|
---|
12 | left=2.5cm,
|
---|
13 | right=2.5cm,
|
---|
14 | includeheadfoot}
|
---|
15 | {geometry}
|
---|
16 | \PassOptionsToPackage{%
|
---|
17 | pdftitle={\thetitle},
|
---|
18 | pdfauthor={\theauthors},
|
---|
19 | pdfsubject={},
|
---|
20 | pdfkeywords={},
|
---|
21 | colorlinks=true,
|
---|
22 | linkcolor=blue,
|
---|
23 | bookmarkstype=pdf
|
---|
24 | }
|
---|
25 | {hyperref}
|
---|
26 |
|
---|
27 | % use the easychair style
|
---|
28 | \documentclass[a4paper, onesided]{easychair}
|
---|
29 |
|
---|
30 | % This provides the \BibTeX macro
|
---|
31 | \usepackage{doc}
|
---|
32 | \usepackage{makeidx}
|
---|
33 |
|
---|
34 | % allow for inclusion of pdf documents
|
---|
35 | \usepackage{pdfpages}
|
---|
36 |
|
---|
37 | %\makeindex
|
---|
38 |
|
---|
39 | % from toc.tex
|
---|
40 | \usepackage{titletoc}
|
---|
41 | \titlecontents{subsubsection}[2pt]{\addvspace{10pt}\bfseries\titlerule[0.5pt]\filright}{}{}{}[]
|
---|
42 | \titlecontents{section}[0pt]{\addvspace{5pt}\filright}{}{}{\dotfill\contentspage}[]
|
---|
43 | \titlecontents{subsection}[10pt]{\addvspace{1pt}\itshape\filright}{}{}{}[]
|
---|
44 | \newcommand{\tocSection}[1]{\contentsline{subsubsection}{#1\\*\titlerule[0.5pt]\vspace{-9pt plus 2pt minus 2pt}}{}{}\nopagebreak[4]}
|
---|
45 | \newcommand{\tocTitle}[2]{\contentsline{section}{#1}{#2}{}\nopagebreak[4]}
|
---|
46 | \newcommand{\tocAuthors}[1]{\contentsline{subsection}{#1}{}{}}
|
---|
47 |
|
---|
48 | \DeclareRobustCommand{\insertpdf}[4]{
|
---|
49 | \phantomsection
|
---|
50 | \addcontentsline{pdf}{section}{#4}
|
---|
51 | \addcontentsline{toc}{section}{#3}
|
---|
52 | \addcontentsline{toc}{subsection}{#2}
|
---|
53 | \fancyhead[LO,LE]{#2}
|
---|
54 | \fancyhead[RO,RE]{#4}
|
---|
55 | \includepdf[pagecommand={\thispagestyle{plain}}, pages=1]{#1}
|
---|
56 | \includepdf[pagecommand={\thispagestyle{fancy}}, pages=2-]{#1}
|
---|
57 | }
|
---|
58 |
|
---|
59 | %% Document
|
---|
60 | %%
|
---|
61 | \begin{document}
|
---|
62 |
|
---|
63 | %% Front Matter
|
---|
64 | %%
|
---|
65 | \pagenumbering{roman}
|
---|
66 | \title{\thetitle}
|
---|
67 |
|
---|
68 | % Authors are joined by \and. Their affiliations are given by \inst, which indexes
|
---|
69 | % into the list defined using \institute
|
---|
70 | %
|
---|
71 | \author{\authora\inst{1} \and \authorb\inst{2} \and \authorc\inst{3}}
|
---|
72 |
|
---|
73 | % Institutes for affiliations are also joined by \and,
|
---|
74 | \institute{
|
---|
75 | Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU),
|
---|
76 | Erlangen, Germany\\
|
---|
77 | %\email{mokhov@cse.concordia.ca}
|
---|
78 | \and
|
---|
79 | European Academy of Bozen/Bolzano (EURAC),
|
---|
80 | Bolzano (BZ), Italy\\
|
---|
81 | %\email{geoff@cs.miami.edu}\\
|
---|
82 | \and
|
---|
83 | Lancaster University,
|
---|
84 | Lancaster, U.K.\\
|
---|
85 | %\email{andrei@voronkov.com, graham@cs.man.ac.uk}\\
|
---|
86 | }
|
---|
87 |
|
---|
88 | \fancyfoot[LO,LE]
|
---|
89 | {S.Evert, E.Stemle, P.Rayson (eds.)}
|
---|
90 | \fancyfoot[CO,CE]
|
---|
91 | {WAC-8, 2013}
|
---|
92 | \fancyfoot[RO,RE]
|
---|
93 | {\thepage}
|
---|
94 |
|
---|
95 | \fancypagestyle{plain}{%
|
---|
96 | \fancyhf{} % clear all header and footer fields
|
---|
97 | \fancyfoot[R]{{\normalsize\thepage}}
|
---|
98 | \renewcommand{\headrulewidth}{0pt}
|
---|
99 | \renewcommand{\footrulewidth}{0pt}}
|
---|
100 |
|
---|
101 | % fine lines above footer and below header
|
---|
102 | \renewcommand{\headrulewidth}{0.4pt}\renewcommand{\footrulewidth}{0.4pt}
|
---|
103 |
|
---|
104 | \clearpage
|
---|
105 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
---|
106 | \maketitle
|
---|
107 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
---|
108 | \thispagestyle{empty}
|
---|
109 | Web corpora and other Web-derived data have become a gold mine for corpus
|
---|
110 | linguistics and natural language processing. The Web is an easy source of
|
---|
111 | unprecedented amounts of linguistic data from a broad range of registers and
|
---|
112 | text types. However, a collection of Web pages is not immediately suitable for
|
---|
113 | exploration in the same way a traditional corpus is.
|
---|
114 |
|
---|
115 | Since the first Web as Corpus Workshop organised at the Corpus Linguistics 2005
|
---|
116 | Conference, a highly successful series of yearly Web as Corpus workshops
|
---|
117 | provides a venue for interested researchers to meet, share ideas and discuss
|
---|
118 | the problems and possibilities of compiling and using Web corpora. After a
|
---|
119 | stronger focus on application-oriented natural language processing and Web
|
---|
120 | technology in recent years – with workshops taking place at NAACL-HLT 2010,
|
---|
121 | 2011 and WWW 2012 – the 8th Web as Corpus Workshop returns to its roots in the
|
---|
122 | corpus linguistics community.
|
---|
123 |
|
---|
124 | Accordingly, the leading theme of this workshop is the application of Web data
|
---|
125 | in language research, including linguistic evaluation of Web-derived corpora as
|
---|
126 | well as strategies and tools for high-quality automatic annotation of Web text.
|
---|
127 | The workshop brings together presentations on all aspects of building, using
|
---|
128 | and evaluating Web corpora, with a particular focus on the following topics:
|
---|
129 |
|
---|
130 | \begin{itemize}
|
---|
131 | \item applications of Web corpora and other Web-derived data sets for
|
---|
132 | language research
|
---|
133 | \item automatic linguistic annotation of Web data such as tokenisation,
|
---|
134 | part-of-speech tagging, lemmatisation and semantic tagging
|
---|
135 | \item (the accuracy of currently available off-the-shelf tools is still
|
---|
136 | unsatisfactory for many types of Web data)
|
---|
137 | \item critical exploration of the characteristics of Web data from a
|
---|
138 | linguistic perspective and its applicability to language research
|
---|
139 | \item presentation of Web corpus collection projects or software tools
|
---|
140 | required for some part of this process (crawling, filtering,
|
---|
141 | de-duplication, language identification, indexing, ...)
|
---|
142 | \end{itemize}
|
---|
143 |
|
---|
144 |
|
---|
145 | \clearpage
|
---|
146 | \renewcommand\contentsname{Table of Contents}
|
---|
147 | \addcontentsline{pdf}{section}{Table of Contents}
|
---|
148 | \tableofcontents
|
---|
149 | \thispagestyle{plain}
|
---|
150 | \clearpage
|
---|
151 |
|
---|
152 | %% main matter
|
---|
153 | %%
|
---|
154 | \thispagestyle{fancy}
|
---|
155 | \pagenumbering{arabic}
|
---|
156 | % paper_9.pdf paper_10.pdf paper_11.pdf paper_2.pdf paper_3.pdf paper_13.pdf paper_5.pdf paper_7.pdf paper_8.pdf paper_6.pdf paper_1.pdf paper_14.pdf
|
---|
157 |
|
---|
158 | \insertpdf{paper_9.pdf}{A.Minocha, S.Reddy, A.Kilgarriff}{Feed Corpus : An Ever
|
---|
159 | Growing Up-to-date Corpus}{Feed Corpus}
|
---|
160 |
|
---|
161 | \insertpdf{paper_10.pdf}{S.Wattam, P.Rayson, D.Berridge}{LWAC: Longitudinal
|
---|
162 | Web-as-Corpus Sampling}{LWAC}
|
---|
163 |
|
---|
164 | \insertpdf{paper_11.pdf}{R.Sch\"afer, A.Barbaresi, F.Bildhauer}{The Good, the
|
---|
165 | Bad, and the Hazy: Design Decisions in Web Corpus Construction}{The Good, the
|
---|
166 | Bad, and the Hazy}
|
---|
167 |
|
---|
168 | \insertpdf{paper_2.pdf}{J.Egbert, D.Biber}{Developing a User-based Method of
|
---|
169 | Web Register Classification}{Developing a User-based Method of Web Register
|
---|
170 | Classification}
|
---|
171 |
|
---|
172 | \insertpdf{paper_7-mod.pdf}{A.Piperski, V.Belikov, N.Kopylov, E.Morozov,
|
---|
173 | V.Selegey, S.Sharoff}{Big and diverse is beautiful: A large corpus of Russian
|
---|
174 | to study linguistic variation}{Big and diverse is beautiful}
|
---|
175 |
|
---|
176 | \insertpdf{paper_13.pdf}{D.Lutz, P.Cadwallader, M.Rooth}{A web application for
|
---|
177 | filtering and annotating web speech data}{Web application for filtering and
|
---|
178 | annotating web speech data}
|
---|
179 |
|
---|
180 | \insertpdf{paper_5.pdf}{S.Schulz, V.Lyding, L.Nicolas}{STirWaC - Compiling a
|
---|
181 | diverse corpus based on texts from the web for South Tyrolean German}{STirWaC}
|
---|
182 |
|
---|
183 | \insertpdf{paper_3.pdf}{A.Kilgarriff, V.Suchomel}{Web Spam}{Web Spam}
|
---|
184 |
|
---|
185 | \insertpdf{paper_8.pdf}{A.Ferraresi, S.Bernardini}{The academic
|
---|
186 | Web-as-Corpus}{Academic Web-as-Corpus}
|
---|
187 |
|
---|
188 | \insertpdf{paper_6.pdf}{S.Scheible, S.Schulte Im Walde, M.Weller, M.Kisselew}{A
|
---|
189 | Compact but Linguistically Detailed Database for German Verb Subcategorisation
|
---|
190 | relying on Dependency Parses from Web Corpora: Tool, Guidelines and
|
---|
191 | Resource}{Database for German Verb Subcategorisation}
|
---|
192 |
|
---|
193 | \insertpdf{paper_1.pdf}{A.Brindle}{Thug breaks man's jaw: A Corpus Analysis of
|
---|
194 | Responses to Interpersonal Street Violence}{Thug breaks man's jaw}
|
---|
195 |
|
---|
196 | \insertpdf{paper_14-mod.pdf}{C.Crangle}{A web-based model of semantic
|
---|
197 | relatedness and the analysis of electroencephalographic (EEG) data}{Web-based
|
---|
198 | model of semantic relatedness and the analysis of EEG data}
|
---|
199 |
|
---|
200 | %\insertpdf{}{}{}{}
|
---|
201 |
|
---|
202 | %------------------------------------------------------------------------------
|
---|
203 | \end{document}
|
---|
204 |
|
---|
205 | % EOF
|
---|