Computer Scientists Developing Technology To Improve Data Mining For Homeland Security

From online news articles to blogs, a massive amount of information is voluntarily being put before the public every day.

Some of this information may be valuable to protecting homeland security. However, to sift through this readily available content and summarize it for agencies like the Department of Homeland Security, analysts need to do more than sit at a computer, entering words like "al-Quaida" into Internet search engines.

That's why Kansas State University's William Hsu and other computer scientists who research data mining are part of a project to develop technology that makes automated Internet searches more useful and productive.

"We're helping to develop the next generation of Web search and crawling," Hsu said. "Our goal is to develop a research program that will help with homeland security. The Department of Homeland Security wants to pull information that's available to anyone in the public domain, like millions of articles from sources like CNN and Al-Jazeera, and monitor them for security."

Hsu is an associate professor of computer and information sciences, head of K-State's Laboratory for Knowledge Discovery in Databases, and co-principal investigator of a Department of Homeland Security-funded summer institute aimed at training future researchers in data sciences. The $2.4 million Data Sciences Summer Institute, headed by the University of Illinois along with K-State and the University of Texas San Antonio, is titled "Multimodal Information Access and Synthesis." The Illinois-led cooperative is one of four such University Affiliate Centers nationwide.

Data mining is a way of processing vast amounts of information and putting it in multiple, useful formats. Hsu's data mining research at K-State includes applications in fields like genome analysis, nanoscale materials modeling and diagnostic medicine. The work at K-State that will benefit homeland security strives to resolve ambiguity in Internet searches. For instance, this would allow a search engine to differentiate between homeland security as a concept and Homeland Security as a government agency. Hsu said that one of the institute's projects aims to improve name recognition, a heavily studied problem in information extraction.

"The goal is to develop an automated system that can pick out al-Quaida as an organization, Kandahar as a place and Osama bin Laden as a person, based upon rules developed from previously-seen documents," Hsu said. "Subcategories are a problem," he said. "'People' is a big tag. Is this a head of state? A celebrity? Someone who was interviewed?"

Data mining research at K-State and collaborating institutions is helping solve another problem with getting information off the Internet -- inefficient crawling. Hsu said search engines provide up-to-date results by first looking through vast numbers of Web pages and archiving them in a process called crawling. Hsu said the project leader, Kevin Chang at the University of Illinois, describes the problem with this process as "crawling in the dark -- you start somewhere and grab everything." Hsu said research in this area will lead to better searches whereby search engines can anticipate keywords, for instance. Search engines also could create virtual neighborhoods of information in which connections are made among bits of information based on the results of similar searches.

Although text-based searches have their complications, Hsu said searching for images is even harder because searches rely on the words people use to describe the images, such as a photo caption. Data mining research at K-State and its partner institutions is leading to technology that will allow search engines to "look" through images from the Web. Hsu said search engines would sift through images that are automatically annotated, or marked up, to describe their contents. This would be done using tools that analyze the shape, border, color and orientation of objects, among many other features, to pick out, for instance, an image of George W. Bush in a press conference photo.

"Computers will figure out an image identity by 'seeing' a feature that all such images have in common," Hsu said.

The next generation of data mining research, Hsu said, will involve computer scientists working with social scientists. By scouring news articles and other public data, researchers can work on something called sentiment analysis.

"Sometimes Homeland Security just needs to know, for instance, what the local reaction is to a particular event such as a bomb threat or recent explosion," Hsu said.

Featured

  • Security Today Announces 2025 CyberSecured Award Winners

    Security Today is pleased to announce the 2025 CyberSecured Awards winners. Sixteen companies are being recognized this year for their network products and other cybersecurity initiatives that secure our world today. Read Now

  • Empowering and Securing a Mobile Workforce

    What happens when technology lets you work anywhere – but exposes you to security threats everywhere? This is the reality of modern work. No longer tethered to desks, work happens everywhere – in the office, from home, on the road, and in countless locations in between. Read Now

  • TSA Introduces New $45 Fee Option for Travelers Without REAL ID Starting February 1

    The Transportation Security Administration (TSA) announced today that it will refer all passengers who do not present an acceptable form of ID and still want to fly an option to pay a $45 fee to use a modernized alternative identity verification system, TSA Confirm.ID, to establish identity at security checkpoints beginning on February 1, 2026. Read Now

  • The Evolution of IP Camera Intelligence

    As the 30th anniversary of the IP camera approaches in 2026, it is worth reflecting on how far we have come. The first network camera, launched in 1996, delivered one frame every 17 seconds—not impressive by today’s standards, but groundbreaking at the time. It did something that no analog system could: transmit video over a standard IP network. Read Now

  • From Surveillance to Intelligence

    Years ago, it would have been significantly more expensive to run an analytic like that — requiring a custom-built solution with burdensome infrastructure demands — but modern edge devices have made it accessible to everyone. It also saves time, which is a critical factor if a missing child is involved. Video compression technology has played a critical role as well. Over the years, significant advancements have been made in video coding standards — including H.263, MPEG formats, and H.264—alongside compression optimization technologies developed by IP video manufacturers to improve efficiency without sacrificing quality. The open-source AV1 codec developed by the Alliance for Open Media—a consortium including Google, Netflix, Microsoft, Amazon and others — is already the preferred decoder for cloud-based applications, and is quickly becoming the standard for video compression of all types. Read Now

New Products

  • EasyGate SPT and SPD

    EasyGate SPT SPD

    Security solutions do not have to be ordinary, let alone unattractive. Having renewed their best-selling speed gates, Cominfo has once again demonstrated their Art of Security philosophy in practice — and confirmed their position as an industry-leading manufacturers of premium speed gates and turnstiles.

  • Luma x20

    Luma x20

    Snap One has announced its popular Luma x20 family of surveillance products now offers even greater security and privacy for home and business owners across the globe by giving them full control over integrators’ system access to view live and recorded video. According to Snap One Product Manager Derek Webb, the new “customer handoff” feature provides enhanced user control after initial installation, allowing the owners to have total privacy while also making it easy to reinstate integrator access when maintenance or assistance is required. This new feature is now available to all Luma x20 users globally. “The Luma x20 family of surveillance solutions provides excellent image and audio capture, and with the new customer handoff feature, it now offers absolute privacy for camera feeds and recordings,” Webb said. “With notifications and integrator access controlled through the powerful OvrC remote system management platform, it’s easy for integrators to give their clients full control of their footage and then to get temporary access from the client for any troubleshooting needs.”

  • 4K Video Decoder

    3xLOGIC’s VH-DECODER-4K is perfect for use in organizations of all sizes in diverse vertical sectors such as retail, leisure and hospitality, education and commercial premises.