Loren Data's SAM Daily™

fbodaily.com
Home Today's SAM Search Archives Numbered Notes CBD Archives Subscribe
FBO DAILY ISSUE OF DECEMBER 23, 2012 FBO #4047
SOURCES SOUGHT

D -- Information Volume & Velocity (IV2)

Notice Date
12/21/2012
 
Notice Type
Sources Sought
 
NAICS
541511 — Custom Computer Programming Services
 
Contracting Office
Defense Information Systems Agency, Procurement Directorate, DITCO-NCR, P.O. BOX 549, FORT MEADE, Maryland, 20755-0549, United States
 
ZIP Code
20755-0549
 
Solicitation Number
IV2_SOURCES_SOUGHT
 
Archive Date
1/26/2013
 
Point of Contact
Kim Oanh P. Scott, , Suzanne Rippenbaum,
 
E-Mail Address
kimoanh.p.scott.civ@mail.mil, Suzanne.M.Rippenbaum.Civ@mail.mil
(kimoanh.p.scott.civ@mail.mil, Suzanne.M.Rippenbaum.Civ@mail.mil)
 
Small Business Set-Aside
N/A
 
Description
SOURCES SOUGHT ANNOUNCEMENT For Defense Information Systems Agency (DISA), Chief Technology Office For Information Volume & Velocity (IV2) CONTRACTING OFFICE ADDRESS: DISA, Procurement Directorate (PLD), Defense Information Technology Contracting (DITCO)-NCR P.O.Box 549 Fort Meade Maryland, 20755. INTRODUCTION: This is a SOURCES SOUGHT TECHNICAL DESCRIPTION to determine the availability and technical capability of small businesses (including the following subsets, HUBZone Firms; Certified 8(a), Service-Disabled Veteran-Owned Small Businesses and Woman Owned Small Business) to provide the required products and/or services. The DISA's Chief Technology Officer is seeking information for potential sources for Engineering support and a mobile application that consists of all the following: discovery, source types, media types, languages, parsing, scheduling, date slicing, image recognition, collection architecture, QA processes, audit alerts, logging, content ID's, recursive mining, and anonymization (proprietary global proxy integration) using open source technology, third party technology, loosely coupled architecture, interoperable, tech transfer, language translation, reporting, compliance, flexible discovery and collection. DISCLAIMER: THIS SOURCES SOUGHT IS FOR INFORMATIONAL PURPOSES ONLY. THIS IS NOT A REQUEST FOR PROPOSAL. IT DOES NOT CONSTITUTE A SOLICITATION AND SHALL NOT BE CONSTRUED AS A COMMITMENT BY THE GOVERNMENT. IN ACCORDANCE WITH (IAW) FAR 15.201 (e) RESPONSES TO THIS SOURCE SOUGHT ARE NOT OFFERS AND CANNOT BE ACCEPTED BY THE GOVERNMENT FOR FORMING A BINDING CONTRACT. NO FUNDS ARE AVAILABLE TO PAY FOR PREPARATION OF RESPONSES TO THIS ANNOUNCEMENT. ANY INFORMATION SUBMITTED BY RESPONDENTS TO THIS TECHNICAL DESCRIPTION IS STRICTLY VOLUNTARY. CONTRACT/PROGRAM BACKGROUND: This is a new requirement Period of Performance (PoP): This is for a base period and four one year options. REQUIRED CAPABILITIES: The Government requires contractor staff to assist with the operation, engineering, and development of the secure mobile application. The contractor shall provide staff with Top Secret and secret security clearances to ensure their ability to work in secure DISA facilities. The contractor must provide the following functionality: • Discovery: The system must have the ability to automatically discover new content as it is created on the internet. The discovery engine should plug into hundreds of different source sites (i.e. major search engines, feed aggregators, etc...) leveraging the content that these companies have collected. The discovery engine has the ability to pull only the content that meets the search criteria of the client. Thereby providing the client with the ability to create highly relevant data streams from a wide variety of source sites. Once content is collected, the user can tune collection to do full site crawls of any domain. New discovery sources can be added within hours upon client request. This is an important aspect of the technology platform as new content sources are created almost weekly (i.e. new Mandarin search engine, foursquare). • Source types: Ability to collect any open source data including but not limited to mainstream media, social media blogs, forums, twitter, shopping, sharing, networking, photos, audio, video, etc... In short, the system can collect the entire open source intelligence market. • Media types: The system must be able to collect, parse and ingest nearly any media type in both open source and proprietary environments. (Text, html, video, audio, pdf, Microsoft office, email, etc...) • Languages: The system must have the capability to collect, parse & ingest open source data in the multiple languages including but not limited to English, French, Arabic (Sudanese, Pashto, Urdu, Farsi) Somali, Bengali, Portuguese, Hindi, Russian, French, Japanese, Mandarin, etc... • Parsing: Open source technology to enable for parsing of documents in multiple formats (html, pdf, etc...) Ability to train staff to manage and upgrade parsing libraries as media sources change or are created over time. Ability to do highly complex parsing. • Scheduling: Crawlers shall be able to be scheduled to crawl based on end user requirements. • Date slicing: Within the discovery process, the system shall have the ability to use date slicing where appropriate. The search engines shall not cap search results at an arbitrary number (i.e. 500 or 1,000 posts). In the event that a discovery term brings back results from an engine that has capped results, the system must use a date slicing approach, whereby the search terms are run using a date filter on the search engine and collect data every day going back in time as far as the user desires. This will shorten the number of results returned by the engine and in most cases lead to complete collection coverage. In the instance where a discovery term is capped on a daily search, the application shall refine the discovery term into more targeted search, to ensure all content can be collected. These targeted terms shall be rolled back up in the system to a single agent, thereby providing consolidated results. • Image recognition: The system shall automatically recognize images within a piece of content and notifiy the developer and/or analyst. The system shall be configured to extract these images and send them to a team of analysts who can read and/or tag the images. The text and/or tag with the image, is embedded into the post and indexed. This shall ensure that an analyst can react to a potentially threatening piece of content embedded in an image and otherwise benign post. More importantly, this is the type of data issue that any system must be able to adjust to. • Collection architecture: Distributed collection architecture - from fetching to parsing to reporting, enabling the system to distribute work across tens of thousands of threads and associated boxes to achieve results in a fraction of the time of a linear process. Designed with built-in capacity for horizontal scaling - built on Linux leveraging cloud technology to offer high performance with relatively low-cost scalability. • QA processes: The scripts shall be designed smartly to report anywhere they have failure. They shall have suites of QA tools built that are specific to this application, which allow the system to ensure quality and accuracy of collected data. The system shall allow complete and total transparency into any and all collection issues. Each and every crawler shall be measured in the QA process based on the amount or types of historical data collected. This shall allow the system to automatically flag a spider that returns a data set that has a material change in historical collection. This eliminates the potential for rogue spiders and/or rogue data. • Audit Alerts: The system shall have a built in and configurable audit alert process that notifies a developer and/or analyst when a non normal event is identified in the collection process. This shall allow the developer to make fixes prior to data ingestion. • Logging: Every spider shall be tracked and logged at every stage of the development, testing and collection process. Every site and piece of parsed metadata shall be logged so a client can easily report on exactly what was collected, when it was collected and where it was collected. The logging shall be tied into a development platform that allows an administrator to see and track all development changes to a spider. This shall provide oversight on the development team. • Content ID's: Each spider and all associated content shall be tied to unique id's that enable an analyst and/or developer to easily remove content from the system and/or disable a crawler with no impact to the overall collection process. This shall provide the Government with the ability to easily remove content and/or disable collection in any area of the system with zero downtime. • Recursive mining: This system shall be designed to support recursive web mining which is the process of discovering and collecting content, parsing out key metadata attributes from the content and using those attributes as seed data for a new discovery search. Anonymization • Proprietary global proxy integration: The system must be integrated with a global proxy network with capabilities to collect data in an anonymous manner. The proxy network have the following requirements: o Collect data from a global network of proxy servers o Route crawlers on demand through specific geo-locations as directed by end user o Anonymously collect data from ssl sites o Support crawling scheduler (i.e. when a crawler runs and how it behaves) o Based on open source technology o Shall be able to work on virtual server and/or cloud based environments o Provide centralized application for routing traffic through proxy network as well as reporting all traffic incidents o Stage honey pot proxies to collect information on sites that are trying to determine where collection is coming from o Stage honey pot sites with dynamic content pulled from the web to misinform site operators as to the type of crawler collecting from their site.  Eg: Creating of a blog search engine in Arabic that is fully functional, whereby anyone researching where crawlers are from will see that the site is a blog aggregator. o This application shall work in conjunction with a collection infrastructure. Because as collection scales, IP's get blocked. So the more the collection system scales, the more the anonymization should scale along with it. o Un-throttled access to a single site: This shall provide the Government with the ability to target up to tens of millions of queries per hour on a single site if necessary. Technologies • Open source technology: The application shall leverage the latest in open source and highly distributed technologies. The technologies employed in the system shall include but not be limited to Python, Hadoop, Mapreduce, Solr, Mongo, etc. • Third party technology: The application shall have a modular architecture and shall have the ability to work with third party technologies such as Oracle, Autonomy, Microsoft, etc... • Loosely coupled architecture: The application shall be loosely coupled, providing the ability for a client to plug in third party or proprietary technologies at any point of the process. The end user shall be able to plug in at the appropriate module or component to meet their needs. • Interoperable: The application shall be interoperable with other legacy and/or emerging technologies at each module and/or component level. The interoperability shall ensure that as new technologies are developed, they can be readily integrated into the system. • Tech transfer: The application and/or any of its modules shall have the ability to be migrated to run behind the clients firewall. This shall enable to government to have control over the entire application. • Language translation: The application shall be able to integrate with third party language translation. The application shall provide the end user with the ability to convert discovery searches, search queries and/or document results from one language to another on demand. Open source language translation is preferred. • Reporting: The application shall provide transparent reporting across the entire process. The government must be able to report on any portion of the collection process in real time. • Compliance: The application shall be designed to provide custom compliance configuration dependent on the Government's objective. The Government shall be able to choose how the data will be viewed and what data the user wants to view. Example • User profiles: The ability to extract user profiles from messages • Keywords/concepts: The ability to eliminate any content with specific words or phrases • Site: The ability to white list/black list content from specific sites • Source type: The ability to eliminate content from specific source types (i.e. video) • Flexible Discovery and Collection: Content shall be discovered and collected in a number of ways, including but not limited too • Topic (in any language) • Keyword/Concept (in any language) • Geographic region • Source sites • RSS feeds • Third party feeds and API's SPECIAL REQUIREMENTS Must have Top Secret clearance. SOURCES SOUGHT: The anticipated North American Industry Classification System Code (NAICS) for this requirement is 541511, size standard of $25.5M. This Sources Sought is requesting responses ONLY from small businesses that can provide the required services under the NAICS Code. To assist DISA in making a determination regarding the level of participation by small business in any subsequent procurement that may result from this source sought you are requested to respond by providing a capabilities statement that speaks to this requirement. You are also encouraged to provide information regarding your plans to use joint venturing (JV) or partnering to meet each of the requirements areas contained herein. This includes responses from qualified and capable Service Disabled-Veteran Owned Small Businesses, Women-owned Small Businesses, HUBZone Small Businesses, and 8(a) companies. You should provide information on how you would envision your company's areas of expertise and those of any proposed JV/partner would be combined to meet the specific requirements contained in this source sought. In order to make a determination for a small business set-aside, two or more qualified and capable small businesses must submit responses that demonstrate their qualifications. Responses must demonstrate the company's ability to perform in accordance with the Limitations on subcontracting clause (FAR 52.219-14). SUBMISSION DETAILS: Responses should include: 1) Business name and address; 2) Name of company representative and their business title; 3) Type of Small Business; 4) Cage Code; 5) Contract vehicles that would be available to the Government for the procurement of the product and service, to include ENCORE II, General Service Administration (GSA), GSA MOBIS, NASA SEWP, Federal Supply Schedules (FSS), CIO-SP3 SB or any other Government Agency contract vehicle. Vendors who wish to respond to this should send responses via email Jan 11, 2013 by 11:00AM Eastern Daylight Time (EDT) to KimOanh.P.Scott.Civ@mail.mil and Suzanne.M.Rippenbaum.civ@mail.mil. Interested businesses should submit a brief capabilities statement package (no more than ten pages) demonstrating ability to perform the services listed in this Technical Description. Proprietary information and trade secrets, if any, must be clearly marked on all materials. All information received that is marked Proprietary will be handled accordingly. Please be advised that all submissions become Government property and will not be returned. All government and contractor personal reviewing source sought responses will have signed non-disclosure agreements and understand their responsibility for proper use and protection from unauthorized disclosure of proprietary information as described 41 USC 423. The Government shall not be held liable for any damages incurred if proprietary information is not properly identified. Questions or clarifications to this Source Sought must be submitted, via email, to KimOanh.P.Scott.Civ@mail.mil, the Contracting Specialist and Suzanne.M.Rippenbaum.civ@mail.mil. The opportunity for clarification of this Source Sought will not change the submission date identified above. Oral communications are not permissible. Marketing brochures and/or generic company literature will not be considered. FedBizOpps will be the sole repository for all information related to this announcement.
 
Web Link
FBO.gov Permalink
(https://www.fbo.gov/spg/DISA/D4AD/DTN/IV2_SOURCES_SOUGHT/listing.html)
 
Place of Performance
Address: Work is to be performed primarily at the contractor's facility and at DISA Fort Meade, MD., Fort Meade, Maryland, 20755, United States
Zip Code: 20755
 
Record
SN02954747-W 20121223/121221234948-4d338270bebebbd181b222ad7ca00bf3 (fbodaily.com)
 
Source
FedBizOpps Link to This Notice
(may not be valid after Archive Date)

FSG Index  |  This Issue's Index  |  Today's FBO Daily Index Page |
ECGrid: EDI VAN Interconnect ECGridOS: EDI Web Services Interconnect API Government Data Publications CBDDisk Subscribers
 Privacy Policy  Jenny in Wanderland!  © 1994-2024, Loren Data Corp.