Developer Forum
Technical Teams at ECI HQ and field offices are doing lot of R&D on various issues, especially on Indian
Name databases in Various Indian Languages, Fuzzy matching, GIS etc. In public interest, the following material is being put
in Public Domain under MIT Style
License.
To learn how to import XML files into traditional databases, please click here.
In case of any queries/suggestions, please contact webmaster@eci.gov.in
Indic Languages
Major portion of R&D has gone into movement from ISCII/ISFOC into Unicode for various Indian languages, and enabling them on the net. Wikipedia has an excellent page on Indic Languages. To see sample usage, follow various subsections on this page, as well as Search Electoral Rolls section, and Genesys Archives.
You may like to download Open Type Fonts And Keyboards
for Indic Languages released by CDAC.
ISCII
to UNICODE Converter by CDAC and SIL Converter
by Summer Institute of Linguistics are also very useful tools. SIL converter allows you to go from Unicode to
ISCII, as well as to English.
Frequency Analysis of Names and Dictionaries
The following Tables/XML files of names have been prepared by cleaning of the raw names. One language tables are shown here both of full names with frequency as well as tokens (after breaking names into different words) with frequency. Dictionaries are of three types - English to Vernacular, Vernacular to English, and Two-Way dictionaries.
Full Name Frequency Analysis (Zipped MDB files)Broken Name Frequency Analysis (Zipped XML files)
Dictionaries (Zipped XML files)
Name Translation
Name Translation from one language to another is based on the Dictionaries available above.
Name Translation Demo Source Code
ASP-Analysis
ASP-Analysis refers to Age, Sex and Position profile of name tokens and is useful to analyze name patterns on various parameters.
ASP Analysis (Zipped XML files)Approximate Matching of Indian Names (AMIN)
A number of Fuzzy (Approximate) matching algorithms are available. (See http://en.wikipedia.org/wiki/Fuzzy_string_searching) However, none of them really works for Indian Names. Fuzzy matching is very critical for Search Engine, and more importantly, for de-duplicating the databases, by pointing out names which occur more than once in the database. AMIN is an algorithm for matching two words based on the matching percentage, and has been developed by the ECI technical team, and it handles the variations in Indian Names very well. It can work not only in English, but is very easily customizable to any language.
Main uses of AMIN are:
- Match two strings/words Demo Source Code
- Full Name Matching Demo Source Code
- Suggestions from Names/Surnames Database Demo Source Code
- Duplicate Names found by using AMIN Sample PDF file
Adaptation of AMIN in Indic Languages:
- Match two strings/words in Unicode Devanagari Demo Source Code
- Suggestions from Names/Surnames Database in Unicode Devanagari Demo Source Code
- Suggestions from Names/Surnames Database in Unicode Punjabi Demo Source Code
Golu Finder & Golu Cleaner
When legacy data in Indian languages is ported into Unicode, one finds many "dangling matras", which are basically dependent vowels where corresponding consonant is missing. These show in vernacular data in Unicode as "Golus", which are matras attached to a dotted circle. The Golu Finder routine finds such patterns in data, and Golu Cleaner cleans the data to the extent possible. This has been highly useful to convert about 500 million names in various Indian languages from legacy fonts into Unicode, and we hope it will be equally useful to IT persons working in Indian languages.
Source Code of Golu FinderSource Code of Golu Cleaner
Sample data showing the use of Golu Cleaner
Electoral Roll Database
Electoral rolls were first hosted on the ECI and CEOs websites in 2004 as PDF files in various languages. Since 2005, National Electoral Search Engine has been developed, and the Search facility in various languages has been made available. Now, within Search Engine, the facility "View Rolls Part by Part" has also been added.
Locational Data
Many Journalists and Students require details of which villages or cities, or portions thereof, are covered under each Assembly or parliament constituency. Our Electoral Rolls database contain references to major village/city covered by each part of the roll. Locational databases compiled from Electoral Rolls database are available here.
Use of GIS
For details on usage and for Election GIS Layers, please click here.
Sample code of mashing up Data source with Google Map APIs (Text File)
FLOSS (Free/Libre/Open-Source Software)
We encourage use of Free/Open Source solutions. Though at ECI and State level, databases are consolidated in Oracle, we encourage smaller databases at the levels of ERO/DEO to be in MySQL, Postgres etc. We are big supporters of OpenOffice.Org. Election Machinery across the country is also encouraged to use freeware like IrfanView for Imaging, GRASS, MapServer, Google Maps API etc. for GIS etc. Our databases are mostly XMLised now, and in this section also, we have released most of the databases in XML format rather than any proprietary format.
Use of IT during elections
Over the years, Election Commission has introduced a number of online applications for displaying Results/Trends and for data analysis. See State Election 2006, State Election Feb 2007, State Election Dec 2007, State Election 2008 as a sample.
