Training on the Dark Web?
Dark Web ChatGPT Unleashed: Meet DarkBERT in CACM By Tom's Hardware, May 19, 2023
To train the model, the researchers crawled the Dark Web through the Tor network, then filtered the raw data (applying techniques such as deduplication, category balancing, and data pre-processing) to generate a Dark Web database.
Researchers at South Korea's Korea Advanced Institute of Science and Technology (KAIST) and data intelligence company S2W have created a large language model (LLM) trained on Dark Web data.
The researchers fed the RoBERTa framework a database they compiled from the Dark Web via the Tor network to create the DarkBERT LLM, which can analyze and extract useful information from a new piece of Dark Web content composed in its own dialects and heavily-coded messages.
They demonstrated DarkBERT's superior performance to other LLMs, which should enable security researchers and law enforcement to delve deeper into the Dark Web.
From Tom's Hardware
No comments:
Post a Comment