Mathematical linguistics

Example Applications of Mathematical Linguistics

Parse tree

Mathematical linguistics is the application of mathematics to model phenomena and solve problems in general linguistics and theoretical linguistics. Mathematical linguistics has a significant amount of overlap with computational linguistics.

Discrete Mathematics

Discrete mathematics is used in language modeling, including formal grammars, language representation, and historical linguistic trends.

Set Theory

Semantic classes, word classes, natural classes, and the allophonic variations of each phoneme in a language are all examples of applied set theory. Set theory and concatenation theory are used extensively in phonetics and phonology.

Combinatorics

In phonotactics, combinatorics is useful for determining which sequences of phonemes are permissible in a given language, and for calculating the total number of possible syllables or words, based on a given set of phonological constraints. Combinatorics on words can reveal patterns within words, morphemes, and sentences.

Finite-State Transducers

Context-sensitive rewriting rules of the form a → b / c _ d, used in linguistics to model phonological rules and sound change, are computationally equivalent to finite-state transducers, provided that application is nonrecursive, i.e. the rule is not allowed to rewrite the same substring twice.^[1]

Weighted FSTs found applications in natural language processing, including machine translation, and in machine learning.^[2]^[3] An implementation for part-of-speech tagging can be found as one component of the OpenGrm^[4] library.

Algorithms

Optimality theory (OT) and maximum entropy (Maxent) phonotactics use algorithmic approaches when evaluating candidate forms (phoneme strings) for determining the phonotactic constraints of a language.^[5]

Graph Theory

Trees have several applications in linguistics, including:

Other graphs that are used in linguistics include:

Weighted graphs, which are used to model the lexical similarity between different languages (after computing lexicostatistics).
Lattice graphs, which can model optimality theory.

Formal linguistics

Formal linguistics is the branch of linguistics which uses formal languages, formal grammars and first-order logical expressions for the analysis of natural languages. Since the 1980s, the term is often used to refer to Chomskyan linguistics.^[6]

Logic

Logic is used to model syntax, formal semantics, and pragmatics. Modal logic can model syntax that employs different grammatical moods.^[7] Most linguistic universals (e.g. Greenberg's linguistic universals) employ propositional logic. Lexical relations between words can be determined based on whether a pair of words satisfies conditional propositions.^[8]

The Logical Relations of Lexical Relations
Lexical Relation	Logical Relation	Example
Synonym	$x\leftrightarrow y$	If pavement then sidewalk, and if sidewalk then pavement.
Complementary antonyms	$(x\rightarrow \neg y)\land (y\rightarrow \neg x)$	If alive then not dead, and if dead then not alive.
Gradable antonyms	$(x\rightarrow \neg y)\land (y\rightarrow \neg x)$	If good then not bad, and if bad then not good.
Relational antonyms (Nouns)	If A is B's X, then B is A's Y	If A is B's parent, then B is A's child.
Relational antonyms (Verbs)	If A Xs to B, then B Ys from A	If A gives to B, then B receives from A.
Relational antonyms (Prepositions)	If A is X B, then B is Y A	If A is below B, then B is above A.
Hyponym	X is a Y, but Y is not only an X	If a terrier, then a dog.
Cohyponym	X and Y are both Zs	A rose and a tulip are both flowers.
Meronym	the parts of a Y include the Xs	The parts of a wheel include the spokes.
Quasi-Meronym	An X belongs to a Y	A tribesman belongs to a tribe.

Semiotics

Methods of formal linguistics were introduced by semioticians such as Charles Sanders Peirce and Louis Hjelmslev. Building on the work of David Hilbert and Rudolf Carnap, Hjelmslev proposed the use of formal grammars to analyse, generate and explain language in his 1943 book Prolegomena to a Theory of Language.^[9]^[10] In this view, language is regarded as arising from a mathematical relationship between meaning and form.

The formal description of language was further developed by linguists including J. R. Firth and Simon Dik, giving rise to modern grammatical frameworks such as systemic functional linguistics and functional discourse grammar. Computational methods have been developed by the framework functional generative description among others.

Dependency grammar, created by French structuralist Lucien Tesnière,^[11] has been used widely in natural language processing.

Differential Equations & Multivariate Calculus

The Fast Fourier Transform, Kalman filters, and autoencoding are all used in signal processing (advanced phonetics, speech recognition).

Statistics

In linguistics, statistical methods are necessary to describe and validate research results, as well as to understand observations and trends within an area of study.

Corpus statistics

Student's t-test can be used to determine whether the occurrence of a collocation in a corpus is statistically significant.^[12] For a bigram $w_{1}w_{2}$ , let $P(w_{1})={\frac {\#w_{1}}{N}}$ be the unconditional probability of occurrence of $w_{1}$ in a corpus with size $N$ , and let $P(w_{2})={\frac {\#w_{2}}{N}}$ be the unconditional probability of occurrence of $w_{2}$ in the corpus. The t-score for the bigram $w_{1}w_{2}$ is calculated as:

t={\frac {{\bar {x}}-\mu }{\sqrt {\frac {s^{2}}{N}}}},

where ${\bar {x}}={\frac {\#w_{i}w_{j}}{N}}$ is the sample mean of the occurrence of $w_{1}w_{2}$ , $\#w_{1}w_{2}$ is the number of occurrences of $w_{1}w_{2}$ , $\mu =P(w_{i})P(w_{j})$ is the probability of $w_{1}w_{2}$ under the null-hypothesis that $w_{1}$ and $w_{2}$ appear independently in the text, and $s^{2}={\bar {x}}(1-{\bar {x}})\approx {\bar {x}}$ is the sample variance. With a large $N$ , the t-test is equivalent to a Z-test.

Lexicostatistics

Lexicostatistics can model the lexical similarities between languages that share a language family, sprachbund, language contact, or other historical connections.

Quantitative linguistics

Quantitative linguistics (QL) deals with language learning, language change, and application as well as structure of natural languages. QL investigates languages using statistical methods; its most demanding objective is the formulation of language laws and, ultimately, of a general theory of language in the sense of a set of interrelated languages laws.^[13] Synergetic linguistics was from its very beginning specifically designed for this purpose.^[14] QL is empirically based on the results of language statistics, a field which can be interpreted as statistics of languages or as statistics of any linguistic object. This field is not necessarily connected to substantial theoretical ambitions. Corpus linguistics and computational linguistics are other fields which contribute important empirical evidence.

Quantitative comparative linguistics

Quantitative comparative linguistics is a subfield of quantitative linguistics which applies quantitative analysis to comparative linguistics. It makes use of lexicostatistics and glottochronology, and the borrowing of phylogenetics from biology.

References

^ "Regular Models of Phonological Rule Systems" (PDF). Archived from the original (PDF) on October 11, 2010. Retrieved August 25, 2012.
^ Kevin Knight; Jonathan May (2009). "Applications of Weighted Automata in Natural Language Processing". In Manfred Droste; Werner Kuich; Heiko Vogler (eds.). Handbook of Weighted Automata. Springer Science & Business Media. ISBN 978-3-642-01492-5.
^ "Learning with Weighted Transducers" (PDF). Retrieved April 29, 2017.
^ OpenGrm
^ Hayes, Bruce; Wilson, Colin (July 1, 2008). "A Maximum Entropy Model of Phonotactics and Phonotactic Learning" (PDF). Linguistic Inquiry. 39 (3). Massachusetts Institute of Technology: 379–440. doi:10.1162/ling.2008.39.3.379. Retrieved February 13, 2025.
^ Haspelmath, Martin (2019). "How formal linguistics appeared and disappeared from the scene". doi:10.58079/nsuq.
^ Kaufmann, S.; Condoravdi, C. & Harizanov, V. (2006) Formal approaches to modality. Formal approaches to modality. In: Frawley, W. (Ed.). The Expression of Modality. Berlin, New York: Mouton de Gruyter
^ Atkins, A. T.; Rundell, Michael (2008). The Oxford Guide to Practical Lexicography. USA: Oxford University Press. p. 132-144. ISBN 978-0-19-927771-1.
^ Hjelmslev, Louis (1969) [First published 1943]. Prolegomena to a Theory of Language. University of Wisconsin Press. ISBN 0299024709.
^ Seuren, Pieter A. M. (1998). Western linguistics: An historical introduction. Wiley-Blackwell. pp. 160–167. ISBN 0-631-20891-7.
^ Tesnière, Lucien (1959). Éléments de syntaxe structurale. Klincksieck.
^ Manning, Chris; Schütze, Hinrich (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press. pp. 163–166. ISBN 0262133601.
^ Reinhard Köhler: Gegenstand und Arbeitsweise der Quantitativen Linguistik. In: Reinhard Köhler, Gabriel Altmann, Rajmund G. Piotrowski (Hrsg.): Quantitative Linguistik - Quantitative Linguistics. Ein internationales Handbuch. de Gruyter, Berlin/ New York 2005, pp. 1–16. ISBN 3-11-015578-8.
^ Reinhard Köhler: Synergetic linguistics. In: Reinhard Köhler, Gabriel Altmann, Rajmund G. Piotrowski (Hrsg.): Quantitative Linguistik - Quantitative Linguistics. Ein internationales Handbuch. de Gruyter, Berlin/ New York 2005, pp. 760–774. ISBN 3-11-015578-8.

Bibliography

Kornai, András (2008). Mathematical Linguistics. London, United Kingdom: Springer-Verlag. ISBN 978-1-84628-985-9.
Aleksej Vsevolodovic, Gladkij (1969). Elementy matematiceskoj lingvistiki [Elements of Mathematical Linguistics] (in Russian). Berlin, Germany: Mouton Publishers. ISBN 90-279-3118-6.
Kracht, Marcus (September 16, 2003). The Mathematics of Language (PDF). PO Box 951543, 450 Hilgard Avenue, Los Angeles, CA 90095–1543 USA. Retrieved February 14, 2025.{{cite book}}: CS1 maint: location (link) CS1 maint: location missing publisher (link)
Parte, Barbara H.; ter Meulen, Alice; Wall, Robert E. (1993). Chierchia, Gennaro; Jacobson, Pauline; Pelletier, Francis J. (eds.). Mathematical Methods in Linguistics. Vol. 30. P.O. Box 17, 3300 AA Dordrecht, The Netherlands: Kluwer Academic Publishers. ISBN 978-94-009-2213-6.{{cite book}}: CS1 maint: location (link)