Show simple item record

dc.contributor.advisorKepuska, Veton
dc.contributor.authorZhang, Wenyang
dc.creatorZhang, Wenyang
dc.date.accessioned2015-07-02T18:50:48Z
dc.date.available2015-07-02T18:50:48Z
dc.date.issued2015-03
dc.identifier.urihttp://hdl.handle.net/11141/682
dc.descriptionThesis (M.S.) – Florida Institute of Technology, 2015en_US
dc.description.abstractThe SRILM is a toolkit for building and applying statistical language models (LMs), designed and developed primarily for use in speech recognition, statistical tagging and segmentation, and machine translation. It has been under development in the SRI Speech Technology and Research Laboratory since 1995. The toolkit has also greatly benefited from its use and enhancements during the Johns Hopkins University/CLSP summer workshops in 1995, 1996, 1997, and 2002. In this thesis, the effect of smoothing and order of N-gram for language model we build by srilm toolkit is studied. My primary method is to use comparison. Firstly, training corpus and testing corpus in website is downloaded. This should be checked in all of the document. Then, I use command window and training corpus to train a language model in different smoothing and order of n-gram and test another one we downloaded in website. Finally, I will get the perplexities which can weigh the language model. I will also list every perplexity and compare them in different smoothing and order of n-gram to see which language model we built has minimal perplexity. Then, we will knwhich language model we built is the best one. Also, I will do it again by another two different corpora, one for training, another for testing, to see the effect of different corpus for language model. If the two group perplexity is the same, it means the different corpus do not affect perplexity. Otherwise, the result is opposite. In conclusion, my measure above all is to calculate perplexity of each language model in different smoothing and order of n-gram and compare every perplexity to find the best way to match the smoothing and order of n-gram for the language model. At the same time, we will know the effect of different corpus for the language model with same smoothing and order of n-gram.en_US
dc.format.mimetypeapplication/pdf
dc.language.isoen_USen_US
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/en_US
dc.titleComparing the Effect of Smoothing and N-gram Order : Finding the Best Way to Combine the Smoothing and Order of N-gramen_US
dc.typeThesisen_US
dc.date.updated2015-04-20T18:46:32Z
thesis.degree.nameMaster of Science in Computer Engineeringen_US
thesis.degree.levelMastersen_US
thesis.degree.disciplineComputer Engineeringen_US
thesis.degree.departmentElectrical and Computer Engineeringen_US
thesis.degree.grantorFlorida Institute of Technologyen_US
dc.type.materialtext


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

http://creativecommons.org/licenses/by/3.0/
Except where otherwise noted, this item's license is described as http://creativecommons.org/licenses/by/3.0/