ANALYSING DATA WITH SOFTWARE, TO FIND INFORMATION
A mass of data that’s growing rapidly every day, a reality nowadays. rsExtract was developed so that users and analysts in security organisations and companies can each day once again summon up the inquisitiveness and energy to pore over masses of data, a task reminiscent of Sisyphus and his huge boulder. Whether it be confiscated data media, seized website content, chat logs or e-mails, with different file formats, in different languages and often unstructured: Modern linguistics and machine learning, in combination with software, sort even the most complex collections of data into a clear decision matrix.
THE SOLUTION: LINGUISTICS RESEARCH “CAST” IN SOFTWARE
rsExtract simplifies the analysis of unstructured data via parallel searches in any number of data pools. A search for words, groups of words and elements of words in more than 500 file formats, combined with the capabilities of the fuzzy search, the phonetic search and the search for synonyms, results in a tool that is more than just a search engine. Finding data faster is not rocket science. The search algorithms in rsExtract automatically complete search terms, correct typing errors and draw attention to different spellings. And all of that in dozens of languages simultaneously. Police authorities, intelligence services, the military, tax investigation departments, corporate security units and other security organisations now have a software tool based on the latest linguistics research at their disposal. Computer-supported data retrieval and analysis in this way saves time for on-the-spot investigations. rsExtract is available as a standalone application or as a functional extension in the solutions of the rsFrame product suite.
AT A GLANCE
Searching: Simple and combinable
With rsExtract, searches can be made quickly in large amounts of data. The combination of search terms with the simple use of filters results in precise search results.
New quality: Automated structuring
With rsExtract, automated support is provided for the extraction of information from unstructured texts. rsExtract extracts entities, for example Name, Address and Account Number from an invoice, and makes these available to the analyst for further filtering of the search results.
Tapping into data pools with a mouse click
With rsExtract, a multitude of different systems can be connected. rsExtract establishes the necessary connections to the data pools via interfaces and connectors.
Data protection in the software DNA
With rsExtract, the data remain protected against access by third parties. This is anchored in the software DNA. A rights and roles concept in combination with a logging function guarantees compliance with data protection requirements.
A WORKFLOW FREE OF BREAKPOINTS
Automatic entity detection
A combination of rule-based and list-based detection ensures high-quality results. The detected entities such as Person, Location or URL can be displayed as independent facets and used as filters.
Technical metadata such as file name, type or author are read out and stored together with functional metadata for the document, and can also be used as search or filter criteria.
Unstructured and uninspected items of information are logically put into relation with each other. Additional facets are generated for associative terms or phrases appropriate for the search query, in which additional search terms are suggested, e.g. for “NSA”: “Snowden”, “leaks” and “enquiry commission”.
Similarity search and recognition of duplicates
Documents with a similar relation to the search query are identified automatically. The duplicates recognition identifies and excludes documents with identical content.
Thesauruses and knowledge networks
Hierarchical thesauruses and knowledge networks can also be integrated. With only a few mouse clicks, a manageable and meaningful list of hits can be filtered from several thousand search results. For example, a search for the term “drugs” also provides “heroin”, “snow”, “diacetylmorphine” or “C21H23NO5” as results.