File Download
Supplementary

postgraduate thesis: Web user profiling and tracking based on behavior analysis

TitleWeb user profiling and tracking based on behavior analysis
Authors
Issue Date2015
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Fan, X. [范晓曦]. (2015). Web user profiling and tracking based on behavior analysis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5731085
AbstractIn the 1990s, the introduction of World Wide Web led a rapid growth in the number of web users, which resulted in new challenges. The Internet turned computer crimes into transnational crimes. In the 21st century, new trends in computer crimes and cybercrimes were continuously discovered. The methods of committing crimes became sophisticated, such as Internet piracy in which the offender used anonymous cyberlocker services to distribute copyrighted resources. With these increasing challenges, countries have given response to cybercrimes with higher priority. A reliable framework which is able to profile and track the user identity is helpful for law enforcement agencies in investigating cybercrimes. In this dissertation, we studies two issues in relation to web user profiling and tracking. First, the data left on a computer is crucial to detect the user identity. Since user’s online behavior characteristics and interest patterns can be extracted from web browser history, we proposed a web user identification model that is based on profiling technique. The model constructed the user profile based on his/her web browsing activities. Two aspects of browsing behavior were examined to construct the profile, namely the user’s page view number (PVN) and page view time (PVT) for each browsed domain. Combined with Term Frequency (TF) and Term Frequency – Inverse Document Frequency (TFIDF) weighing scheme, four weighing models were proposed and compared to find profiles which were similar to the target profile. There is a high probability that these resulting profiles belong to the same person as the target profile in real world. Experiments were conducted on 51 computers in real world, and our results revealed that the model can be used to uniquely identify web users effectively. The performance of TFIDF is generally better than TF, and TFIDF-PVN weighing model achieves the best result for recognizing accurate web users. Second, besides the data left on a computer, much digital evidence can be found on the Internet. In recent years, cyberlocker services become popular that facilitates people to distribute infringing copies of copyrighted media on the Internet. Therefore, we studied the issue of cyberlocker-based piracy. Due to the anonymity property of cyberlocker, we proposed a model to collect data related to cyberlocker from public forums, on which the sharing behavior of cyberlocker links can be connected to a specific user. Then we built the forum user profiles based on their sharing activities of cyberlocker links. The relationships between different profiles were analyzed by multidimensional scaling analysis and agglomerative hierarchical clustering analysis. We are able to detect forum users with similar sharing behaviors. Furthermore, we introduced five categories of behavioral characteristics to construct more comprehensive profiles. Experiments were conducted on real data collected from popular forums in Hong Kong and the results indicated the model can effectively detect profiles with similar file sharing characteristics for identity tracking. The model also generated taxonomy of forum users containing several general profiles, which can help law enforcement agencies to determine the behaviorial patterns of cyberlocker link sharing for different types of users.
DegreeDoctor of Philosophy
SubjectInternet users
Web usage mining
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/224635

 

DC FieldValueLanguage
dc.contributor.authorFan, Xiaoxi-
dc.contributor.author范晓曦-
dc.date.accessioned2016-04-11T23:15:15Z-
dc.date.available2016-04-11T23:15:15Z-
dc.date.issued2015-
dc.identifier.citationFan, X. [范晓曦]. (2015). Web user profiling and tracking based on behavior analysis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5731085-
dc.identifier.urihttp://hdl.handle.net/10722/224635-
dc.description.abstractIn the 1990s, the introduction of World Wide Web led a rapid growth in the number of web users, which resulted in new challenges. The Internet turned computer crimes into transnational crimes. In the 21st century, new trends in computer crimes and cybercrimes were continuously discovered. The methods of committing crimes became sophisticated, such as Internet piracy in which the offender used anonymous cyberlocker services to distribute copyrighted resources. With these increasing challenges, countries have given response to cybercrimes with higher priority. A reliable framework which is able to profile and track the user identity is helpful for law enforcement agencies in investigating cybercrimes. In this dissertation, we studies two issues in relation to web user profiling and tracking. First, the data left on a computer is crucial to detect the user identity. Since user’s online behavior characteristics and interest patterns can be extracted from web browser history, we proposed a web user identification model that is based on profiling technique. The model constructed the user profile based on his/her web browsing activities. Two aspects of browsing behavior were examined to construct the profile, namely the user’s page view number (PVN) and page view time (PVT) for each browsed domain. Combined with Term Frequency (TF) and Term Frequency – Inverse Document Frequency (TFIDF) weighing scheme, four weighing models were proposed and compared to find profiles which were similar to the target profile. There is a high probability that these resulting profiles belong to the same person as the target profile in real world. Experiments were conducted on 51 computers in real world, and our results revealed that the model can be used to uniquely identify web users effectively. The performance of TFIDF is generally better than TF, and TFIDF-PVN weighing model achieves the best result for recognizing accurate web users. Second, besides the data left on a computer, much digital evidence can be found on the Internet. In recent years, cyberlocker services become popular that facilitates people to distribute infringing copies of copyrighted media on the Internet. Therefore, we studied the issue of cyberlocker-based piracy. Due to the anonymity property of cyberlocker, we proposed a model to collect data related to cyberlocker from public forums, on which the sharing behavior of cyberlocker links can be connected to a specific user. Then we built the forum user profiles based on their sharing activities of cyberlocker links. The relationships between different profiles were analyzed by multidimensional scaling analysis and agglomerative hierarchical clustering analysis. We are able to detect forum users with similar sharing behaviors. Furthermore, we introduced five categories of behavioral characteristics to construct more comprehensive profiles. Experiments were conducted on real data collected from popular forums in Hong Kong and the results indicated the model can effectively detect profiles with similar file sharing characteristics for identity tracking. The model also generated taxonomy of forum users containing several general profiles, which can help law enforcement agencies to determine the behaviorial patterns of cyberlocker link sharing for different types of users.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.subject.lcshInternet users-
dc.subject.lcshWeb usage mining-
dc.titleWeb user profiling and tracking based on behavior analysis-
dc.typePG_Thesis-
dc.identifier.hkulb5731085-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats