Federated Search Directories

1. Juli 2024 – OpenWebSearch.eu ist das europäische Pendent zu Common Crawl. Das Ziel: ein offener Webindex (+ Services) = Datensouveränität für Europa. Das Projekt ist EU-finanziert und fördert seinerseits Community-Projekte. Leider habe ich den Call zu knapp vor Bewerbungsfrist entdeckt – entsprechend aus der Hüfte geschossen war meine (jetzt abgelehnte) Idee.

Title

Target Field of the Research

Curation of search result sets: Are end users willing / able to establish useful / valuable search directories for their favorite topics / area of expertise?

Federation: Will the pool of directories improve through collaboration?

Approach and main challenges: describe your approach, methodology

This project will investigate the question if old-school search directories, curated by expert users, could be a feasible alternative to today’s paradigm of “10 blue links per page ranked by some algorithm”.

It will empower end users to help their fellow humans navigate the web in a new old way and hopefully show them the beauty and knowledge that the “long tail” of the web entails.

This approach is going back to a time when the web was so small that books where printed with lists of recommended starting points (I still have my copy of O’Reilly’s “The Whole Internet”), when Yahoo established their famous web directory, and when this format was the status quo (I worked on one of the oldest sites which is still online) because literally everybody could do it with a little knowledge about basic HTML.

Approach and main challenges: expected outcomes, relevance

Mainstream search engines are plagued with SEO and redundant content. Also, they have business interests that are counterproductive.

If people can curate and share their own favorite search results / starting points, we might get back to the experience that the web was new and exciting, and not dominated by the same 10 big companies every time you want to look something up.

If we then aggregate this “human” signal, we might find out what people really like. Back to a time when links were endorsements, not paid for.

Describe the contribution to the component(s) in detail

Search Applications: Starting with a known starting point URL or a given query, we give the user the opportunity to mix and match relevant information with similar search results. I’d also like to experiment if a “random walk” component (like that in the original Google rank formula) could be of any help to prevent the problem of “too much choice” on the users.

Users can then build URL bundles that have their own URL, will be hashtag-able, will be searchable, clonable (notifying the original author), can be shared, embedded on other sites and so on. The search result set will become a “social object” with an REST API interface.

With similarity metrics and vector based representations, we could then both recommend new URLs entering the search index and built detail hierarchies and tags-onomies of the user generated content.

Search Paradigms: As described above, this project will be answering the question if search directories are still useful in 2024 or could become more prominent in the future. Can we find new signals / weights in this user generated data?