Job Description
• Design, develop, and maintain efficient, reusable, and reliable Python code for web crawling and scraping.
• Build and manage robust scraping systems that process large amounts of unstructured data.
• Implement systems to ensure data quality and integrity.
• Collaborate with data science teams to assist in cleaning and structuring scraped data for analysis.
• Handle the storage and indexing of scraped data in databases.
• Ensure compliance with data protection regulations and best practices.
• Optimize existing systems for improved performance and scalability.
• Stay updated with new technologies and advancements in web scraping and data extraction.
You will be responsible for
• Develop, test and maintain high-quality software using Python programming language.
• Participate in the entire software development lifecycle, building, testing and delivering high-quality solutions.
• Collaborate with cross-functional teams to identify and solve complex problems.
• Write clean and reusable code that can be easily maintained and scaled.
• Write an efficient, transparent and well-documented code that meets industry regulations and standards.?
• Ensure proper adherence to privacy and security standards.?
• Create large-scale data processing pipelines to help developers build and train novel machine learning algorithms.
• Participate in code reviews, ensure code quality and identify areas for improvement to implement practical solutions.
• Debugging codes when required and troubleshooting any Python-related queries.
• Keep up to date with emerging trends and technologies in Python development.
You need to have
• Bachelor's degree in computer science, Information Technology, or a related field.
• Minimum 3 to 5+ years of experience as a Python Developer.?
• Proven experience in web crawling and scraping using Python.
• Strong experience with Python libraries such as Scrapy, Beautiful Soup, and Selenium.
• Proficiency in handling HTML, XML, and JSON data formats.
• Solid understanding of HTTP, requests, and response structures.
• Experience with database systems, both SQL (e.g., MySQL, PostgreSQL) and NoSQL (e.g., MongoDB).
• Knowledge of data cleaning and preprocessing techniques.
• Familiarity with web scraping ethics and legality.
• Experience with version control tools, preferably Git.
• Knowledge of distributed data processing is a plus (e.g., Apache Kafka, Spark).
• Experience with cloud services like AWS, Google Cloud, or Azure for deployment and scaling.
• Familiarity with Docker and containerization.
• Background in machine learning and AI technologies for data processing.
About Us
NSE Cogencis is a leading provider of data, news and actionable insights and analytics. Professionals across commercial banks, asset management companies, insurance companies, conglomerates and large corporate use our products to trade, to manage funds and hedge risks. As part of NSE Group and 100% subsidiary of NSE Data, we play an important role in Indian financial market ecosystem.
Curiosity is our biggest asset and it’s in our DNA. Our curiosity to understand the market trends and challenges faced by today’s market professional drives us to build and manage the most comprehensive database on Indian financial market, bring exclusive market moving news on our platform and continuously upgrade our analytical capability. It is CURIOSITY that drives everything we do at Cogencis. Together we learn, innovate and thrive professionally.
We are an equal opportunity employer, and we strive to create a workplace that is not only employee friendly but puts our employees at the centre of our organisation. Wellbeing and mental health of our employees are a clear priority for us at NSE Cogencis.