Extracting per-topic temporal transitions of popular words from parallel corpora

  • We propose a new topic model [Masada+ PAKDD2011] for extracting temporal transitions of word probabilities for each topic.
  • Our model is extended for parallel corpus analysis and is applied to Chinese-English abstracts of computer science papers.
    • The years of the abstracts range from 2000 to 2009.
    • We only show five among tens of the extracted topics.
    • Each topic is represented by the top three Chinese and English words of large probability in each year.
    • No Chinese-English dictionaries are used.
    • This is a joint research with Prof. Atsuhiro Takasu in NII. The dataset was collected and cleaned up by Haipeng Zhang.