domingo, 5 de mayo de 2013

05/05/13

The algorithms were implemented from scratch and trained on the Reuters articles. For the frequency-rank approach, we additionally used a readily trained implementation of the original algorithm, which we included in the evaluation process as LC4J3. We used each of the algorithms to detect the languages of the previously unused Reuters headlines and the words obtained from dictionaries.
Table 2 shows the accuracies for detecting the language of the Reuters headlines and the dictionary entries across all algorithms and all settings for n. But, the values of LC4J need to be treated carefully: in many cases the algorithm could not detect any language at all. This might be, because the language models provided with the implementation are too sparse for short texts. The values given here are solely based on those cases where language detection was successful. When taking into account the unclassified documents, the accuracy drops drastically to 39.24% for the headlines and to 30.33% for the dictionary words.

No hay comentarios:

Publicar un comentario