|dc.description.abstract||Continuously increasing the number of interconnecting devices, cloud services, and social media usage require enormous amounts of mobile data. Due to the rapid increase in cellular data demand, accurate cellular data prediction has become more critical. The increase in the number of mobile subscribers raised the issue of whether users' location can be determined. The most important uses of geolocation technology are emergencies and security purposes. In addition to emergency and security affairs, geolocation technology can decrease cellular network maintenance costs. Mobile network systems have become highly complex with a large number of parameter and feature add-ons. Along with the increased complexity, old fashion methods become insufficient for network management, and an advanced optimization approach is necessary, which is machine learning. Increased sensitivity of received radio parameters enabled many applications to get dependable results. This thesis proposes a reliable solution for uplink data rate prediction and device geolocation using LTE radio parameters in machine learning algorithms. We first performed an extensive LTE data collection in three distinct locations and determined the LTE lower layer parameters correlated with uplink (UL) throughput.
iv Selected LTE parameters with a high correlation with UL throughput (RSRP, RSRQ, and SNR) are trained in five different learning algorithms for estimating UL data rates. Our evaluations show that Decision Tree and K-nearest Neighbor algorithms are outperforming the other algorithms at throughput estimation. The prediction accuracy with the R2 determination coefficient of 92%, 85%, and 69% is obtained from Melbourne-FL, Batman-Turkey, and Houston-TX, respectively. Two intense LTE measurement data taken from one of the major US cellular carriers is used for device geolocation. After feature extraction and data analysis, received signal strengths of one serving and up to eight neighbor base stations are trained in machine learning algorithms. When %90 of data used as training set and %10 of data is used as testing set in K-nearest Neighbor algorithm, mean distance error of 34.9 meters with the standard deviation of 147.5 meters has achieved in the data measured in San Francisco, Ca.||en_US