Faculty of Mathematics and Informatics - Technologies of natural languages

scientific and applied results, obtained
from the members of the department of Computer Science
in the period 1988-2009)

Main scientific contributions

The main scientific contributions of the PU team are in the field of conceptual and linguistic modeling. Original contributions can be summarized in the following points, which present results obtained in the period 1989 - 2009:

  1. On the base of carried out analysis, the role of different models (cognitive, conceptual, abstract, physical, informational and computer) in the process of cognition is specified.
  2. Mathematical model of the concept subject domain is created and a number of characteristic elements for the subject domain are formally defined.
  3. Common model for presenting language structures and processes in natural languages, with introducing of structured presentations, analytic images and context (heuristic) roles for doing different analysis and solving linguistic polysemy, is proposed.
  4. Model of terminative automat, which gives the opportunity for carrying out the context lexical analysis and solving the polysemy on the base of grammars, composed from cascades (which present each type analysis), is created and program prototype is realized.
  5. Common mathematical model of computer dictionary is proposed and methodic for creation of systems of computer dictionaries with automate investigation of standard dictionaries and text corpuses.
  6. Formal and computer model of morphological processor (with opportunities about precise and approximate analysis and synthesis) are presented and effective realized (in the case for the natural language from flective type), as methods for automatic conversion of morphological dictionary in other type computer dictionaries are experimented.
  7. Solid lexical base of computer dictionaries for Bulgarian language is created, including English-Bulgarian dictionary - with more of 165 000 rows (lexemes), Bulgarian synonym dictionary - with about 25 000 synonym nests, two side link between more of 50 000 Bulgarian and English synonym rows (in the process of verification), morphological dictionary with more of 80 000 basic forms, etc.
  8. Methods and means for all kinds of analysis of Bulgarian texts, carried out with cascade grammars - segmentation (with about 100 rules) of lexemes, sentences and paragraphs, morphological analysis with solving polysemy (more of 20 context rules), syntactic analysis (with unified attribute grammar of Bulgarian language, which contains about 2 000), etc. , are created.
  9. Statistical methods for automatic processing of Bulgarian text (stochastic marker on the base of Markov's models, syntactic analyzer, etc.) are developed.
  10. Methods for automatic determining of lexical characteristics (stress, kind of the verb, etc.) and for automatic classification and processing of personal names are studied.
  11. Models for automatic investigation of the structure of Bulgarian text and extraction of semantic characteristics and constituent roles from lexical resources (standard dictionaries and text corpuses) are created.
  12. Representative lexical base of Bulgarian language with reference to the world semantic net WordNet (about 50 000 synonym rows in the process of specification and verification) is built automatically.
  13. Morphological dictionary and processes of word-changing and word-building through special couple finite automata (two stage no cyclic automat with labels of finite states, especially effective for analysis and synthesis of texts on flective languages) are modeled.
  14. In the domain of semantic linguistic analysis, methods for identifying semantic characteristics, metaconcepts and models for management of unknown words and constructions from analyzing system are proposed and successfully experimented.
  15. Model for phonetic transcription (conversion of Bulgarian computer texts to row of phonemes) is presented. On the base of which, program system for sound tracking Bulgarian computer texts, which use created sound DB (from sound files, presenting Bulgarian allophones), is created.

Other results, pursuing carrying out the researches:

a) obtaining (including automatically) hundreds rules of unification attribute grammar of Bulgarian language and dozens of solving rules for analysis of Bulgarian computer text.

b) development of linguistic processor prototype for Bulgarian language in the form of multipurpose program system for carrying out linguistic analysis on cascade principle and setting context rules in cascade grammars.

c) automate creation of linguistic DB and grammar of Bulgarian language, which contain hundred thousands dictionary papers (connected to phonetic, segmentation and abbreviation, lemmatization, morphology, word-building, syntax, semantic, etc.), including providing link with no formal standard in this field - world lexical base WordNet.

d) computer methods and means for practical realization of the Bulgarian language interface (especially in the learning systems) for computer supporting language education, etc. are developed.

