Remember me

Register  |   Lost password?


 

Complexity Digest Blog Header


Rank-frequency distribution of natural languages: a difference of probabilities approach

Tue, 27 Nov 2018 23:37:08 GMT

The time variation of the rank k of words for six Indo-European languages is obtained using data from Google Books. For low ranks the distinct languages behave differently, maybe due to syntaxis rules, whereas for k>50 the law of large numbers predominates. The dynamics of k is described stochastically through a master equation governing the time evolution of its probability density, which is approximated by a Fokker-Planck equation that is solved analytically. The difference between the data and the asymptotic solution is identified with the transient solution, and good agreement is obtained.

 

Rank-frequency distribution of natural languages: a difference of probabilities approach
Germinal Cocho, R. F. Rodríguez, Sergio Sánchez, Jorge Flores, Carlos Pineda, Carlos Gershenson

Source: arxiv.org