Posts filed under ‘General’

Social Bookmarking

Social bookmarking is a method used on the Internet in order to share, organize, search, and manage bookmarks of web resources. Descriptions may be added to these bookmarks in the form of metadata, so that other users do not need to download the content of the resource. Such descriptions may be free text comments, votes in favor of or against its quality, or tags that can become a folksonomy.

Users are able to organize their bookmarks with informal tags instead of the traditional browser-based system of folders by using this social bookmark. They can also view another bookmarks associated with a chosen tag, and include information about the number of users who have bookmarked them. Some social bookmarking services also draw inferences from the relationship of tags to create clusters of tags or bookmarks.

Many social bookmarking services provide web feeds for their lists of bookmarks, including lists organized by tags. This allows subscribers to become aware of new bookmarks as they are saved, shared, and tagged by other users.

On one hand, social bookmarking method has several advantages which are the following:

  • It has search engine spiders that permits people find and bookmark web pages that have not yet been noticed or indexed.
  • All tag-based classification of Internet resources is done by human beings, who understand the content of the resource, as opposed to software, which only determine the meaning of a resource.
  • A social bookmarking system can rank a resource based on how many times it has been bookmarked by users, which may be a more useful metric for end users than systems that rank resources based on the number of external links pointing to it.

On the other hand, social bookmarking also have some disadvantages:

  • There is no standard set of keywords, no standard for the structure of such tags, mistagging due to spelling errors, tags that can have more than one meaning, unclear tags due to synonym/antonym confusion, unorthodox and personalized tag schemata from some users, and no mechanism for users to indicate hierarchical relationships between tags.
  • Social bookmarking can also be susceptible to corruption and collusion as some users have started considering it as a tool to use along with search engine optimization in order to make their website more visible. The more often a web page is submitted and tagged, the better chance it has of being found.
  • Spammers have started bookmarking the same web page multiple times and/or tagging each page of their web site using a lot of popular tags making developers to have to adjust their security system to overcome abuses.

Sources of information

Social bookmarking. (2010, July 20). In Wikipedia, The Free Encyclopedia. Retrieved 11:10, July 21, 2010, from


octubre 22, 2009 at 6:53 pm 3 comentarios

Sobre la traducción automática

La traducción automática (TA), también llamada MT (del inglés Machine Translation), es un área de la lingüística computacional que investiga el uso de software para traducir texto o habla de un lenguaje natural a otro. En un nivel básico, la traducción por ordenador realiza una sustitución simple de las palabras atómicas de un lenguaje natural por las de otro.

En las últimas décadas ha habido un fuerte impulso en el uso de técnicas estadísticas para el desarrollo de sistemas de traducción automática. Para la aplicación de estas técnicas a un par de lenguas dado, se requiere la diponibilidad de un corpus paralelo para dicho par.

La intervención humana puede mejorar la calidad de dicha traducción, por ejemplo, algunos sistemas pueden traducir con mayor exactitud si el usuario identifica previamente las palabras que corresponden a nombres propios. Con la ayuda de estas técnicas, la traducción por ordenador ha mostrado ser un auxiliar útil para los traductores humanos. Sin embargo, los sistemas actuales son incapaces de producir resultados de la misma calidad que un traductor humano, particularmente cuando el texto a traducir usa lenguaje coloquial o familiar. Esto puede comprobarse en traducciones del castellano al catalán, por ejemplo con el traductor translendium donde en palabras como “puesto”, el traductor duda si la palabra correcta sería “lloc” o “parada” o la palabra “altura” no está claro si la traducción correcta sería “altura” o “alçaria”. Aún así no suele haber grandes problemas en traducir textos procedentes de estos dos idiomas ya que al ser lenguas romances provienen de la misma raiz, en cambio en otros como el inglés que está tipológicamente alejada, suele haber más complicaciones en cuanto a la traducción de sujetos (“he”, “she” o “it”) aparte de otras palabras que pueden producir confusión como “relación” la cual traducida al inglés puede ser “relation” o “relationship”.

Debido a estos problemas es mejor tener a mano un buen diccionario y/o un corpus en el lenguaje destino para poder solucionar los problemas de contexto anteriormente citados.


Colaboradores de Wikipedia. Traducción automática [en línea]. Wikipedia, La enciclopedia libre, 2008 [fecha de consulta: 28 de abril del 2008]. Disponible en <>.

mayo 5, 2008 at 11:09 am Deja un comentario

The main characteristics of a translation task according to the FEMTI report

The Framework for Machine Translation Evaluation in ISLE is a resource that helps MT evaluators define contextual evaluation plans.

FEMTI is made of two interrelated classifications. The first one is used for what the user needs and the second concerns quality characteristics of MT systems which are potentially of interest, along with the metrics that were proposed to measure them.

The characteristics of the translation task are:

  • Assimilation: “the ultimate purpose of the assimilation task (of which translation forms a part) is to monitor a (relatively) large volume of texts produced by people outside the organization, in (usually) several languages”.
  • Dissemination: “the ultimate purpose of dissemination is to deliver to others a translation of documents produced inside the organization”.
  • Communication: “the ultimate purpose of the communication task is to support multi-turn dialogues between people who speak different languages. The translation quality must be high enough for painless conversation, despite possible syntactically ill-formed input and idiosyncratic word and format usage”.


The Framework for Machine Translation Evaluation in ISLE, Retrieved April 15, 2008 from

abril 15, 2008 at 8:05 pm Deja un comentario

Some of European research centres for Human Language Technologies

In my opinion three research centres on Europe which contain suitable information about their work are the following:

  • Language Technology Lab
  • Lengoaia Naturalaren Prozesamendurako IXA Taldea
  • Edinburgh Language Technology Group

I have elected those centres because I think that in their pages they explain their activities and projects in a very concret way. Moreover they have links about some publications which help people to find more information about Human Language Technologies.

IXA Taldea which was created at the University of the Basque Country is promoting the modernization of Basque by means of developing basic computational resources for it. That is an important advance because it would be great that Basque could have more reference pages and more information in as the most of the articles, technology etc is usually available only in English.

Apart of that group the other centres contain such an inportant work in techniques of Human Language Technologies, an example of that is Language Technology Lab that is very well organizated and also Edinburgh Language Technology Group. 

abril 1, 2008 at 9:39 pm Deja un comentario

Who is Hans Uszkoreit?

Hans Uszkoreit is a computational linguistic in Saarbrücken, Germany. He is working as professor at Saarland University and as scientific director at the German Research Center for Artificial intelligence (DKI) where he is the head of the newly founded Language Technology Lab. He is also co-founder and professor of the “European Postgraduate Program Language Technology and Cognitive Systems” . Apart of them he is also member of many other associations, academies etc.

He is interested in computer models of natural language understanding and production, advanced applications of language and knowledge technologies, for example grammar formalisms and their implementation.

His most recently publications are the following:

Uszkoreit, H. (2007) Methods and Applications for Relation Detection. In: Proceedings of the Third IEEE International Conference on Natural Language Processing and Knowledge Engineering, Beijing, 2007.

Uszkoreit, H. F. Xu, W. Liu (2007) Challenges and Solutions of Multilingual and Translingual Information Service Systems, To appear in Proceedings of HCI International 2007, 12th International Conference on Human-Computer Interaction, Beijing, 2007.

Uszkoreit, H., F. Xu, Weiquan Liu, J. Steffen, I. Aslan, J. Liu, C. Müller, B. Holtkamp, M. Wojciechowski (2007)
A Successful Field Test of a Mobile and Multilingual Information Service System COMPASS2008. In Proceedings of HCI International 2007, 12th International Conference on Human-Computer Interaction, Beijing, 2007.

H. Uszkoreit, H. Li (2007) A Seed-driven Bottom-up Machine Learning Framework for Extracting Relations of Various Complexity, To appear in: Proceedings of ACL 07, Annual Meeting of the Association of Computational Linguistics, Prague 2007.


Hans Uszkoreit (2008). Retrieved 1st April from

abril 1, 2008 at 7:17 pm Deja un comentario


CLARIN (Common Language Resources and Technology Infrastucture) is a research infrastructure of language resources and its technology.

It offers some services to the different communities of linguists, the humanities scholars and to the society.

The mission of CLARIN to create a Pan European infrastructure that will boost humanities research in a multicultural and multilingual area as Europe is and facilitate a multilingual and multicultural education in schools, colleges and universities.

By language resources and its technology they mean all knowledge sources based on written or spoken language and the tools to carry out operations on such language material. There are some characteristics about what CLARIN language resources and technologies are:

  • Texts of all sorts which can be digitized medieval sources, web-sites, newspapers, digitized books etc.
  • Multimedia recordings (audio/video) and time series recorded during communication like data glove or eye tracking.
  • Various types of manually or automatically created annotations on texts, media streams etc.
  • Tools such as aligners, speech recognizers, tokenizers, part-of-speech taggers, parsers, manual annotators, viewers etc.
  • Various types of knowledge sources encapsulating knowledge about resources and languages such as metadata descriptions, GIS, lexica, concept registries, ontologies etc.

The Language Resource and Technology community is working on the content of the resources and apply specific tools tailored to their research interests, the language resource community is interested in linguistic content, structures, formal semantics etc.


Common Language Resources and Technology Infraestructure (CLARIN). (2008). Retrieved March 31, 2008 from

    marzo 31, 2008 at 10:49 am 1 comentario

    What is Language Technology?

    Language technology or human language technology comprises computational methods, computer programs and electronic devices that are specialized for analyzing, producing or modifying texts and speech. These systems must be based on some knowledge of human language.

    There are some software products that have some knowledge of human language. Such of these products are going to change our lives because they are urgently needed for improving human-machine interaction since the main obstacle in the interaction between human been and computer is a communication problem. Today’s computers do not understand our language so there are softwares which can listen and speak. These current advances in the recognition of spoken language permit computers help people communicate each other.

    In spite of the fact that people have different mother tongues, computational linguistics are trying to create software systems which simplify the work of human translators.

    The rapid growth of the Internet and the emergence of the information society poses exciting new challenges to language technology  because of the increasing multilinguality of the web.

    The future of language technology will be determined by the growing need for user-friendly software.


    Language Technology Lab. DFKI-LT – What is Language Technology?  by Hans Uszkoreit. 2008, DFKI, Germany). Retrieved March 28th from

    marzo 28, 2008 at 12:59 pm Deja un comentario

    Entradas antiguas Entradas recientes


    septiembre 2018
    L M X J V S D
    « Jul    

    RSS CiteULike: silviadelpozo’s library


    Entradas recientes