The majority of the activities are as a systems programmer and technology implementer for the National Resource for Digitization of Biological Collections (iDigBio) project. Design, implement and support complex ETL mappings to migrate large data volumes from heterogeneous source systems into a central NoSQL data store. Develop and use tools to perform data analytics, data manipulation, and reporting according to data consumer needs, and participate in the design of new or changing data mappings and workflows, evolving the iDigBio data model as data standards are updated and data growth need arises. Produce technical specification and documentation to effectively communicate with data providers and consumers.
Develop software for data-related cloud middleware and web portals. The incumbent will design, implement, and maintain storage, infrastructure, platform, and software clouds including software and hardware selection. Integrate external cloud and distributed data resources with resources developed as part of the projects. Collect and report performance and quality metrics to insure resources are meeting project goals. Create documentation and software packages to make work usable by other institutions. Train collaborators and end users on the cloud and software resources created. Liaise with developers and users of biological collection management systems providing data to iDigBio to gain an understanding of data requirements and functionality, to address data transformation issues, to develop tools that facilitate data quality improvement, and to enable bi-directional data flows.
Coordinate ingestion processes across national and international partner organizations. Interpret iDigBio project needs, stakeholder requests, and PI directives to drive prioritization and triage of tasks and issues. Interface directly with data providers when technical complexity exceeds the capabilities of mobilization staff. Collaborate with national and international organizations as needed to improve data quality and data sharing standards. Collaborate with Partner Projects and new potential partners by way of physical and virtual meetings. Manage the Data Ingestion and Mobilization meeting.
Assist in maintaining existing computer, networking, and software infrastructure in ACIS laboratory. Integrate infrastructure developed as part of the project into the overall resource offerings of the ACIS laboratory. Document best practices and develop technical support materials for ACIS hardware and software.
- 4-year degree in computer science or 4-year degree in a biological sciences field with significant IT background.
- 3+ years experience designing, building and maintaining data integration software and heterogeneous data systems.
- Strong programming skills with proficiency in multiple programming languages/environments (Java, PHP, python, SQL, stored procedures, JPA, NoSQL, REST), operating systems (Linux, Windows, Mac) data stores (relational, object-oriented, flat files, and document-oriented), data format/protocol standards (ODBC/JDBC, JSON, CSV, XML/XSLT), and data modeling/mapping tools.
- Demonstrated skills and abilities sufficient to perform primary design and interface responsibilities for organization-wide systems.
- Knowledge of:
- the theoretical and practical application of a body of highly specialized knowledge in information technology.
- the collections data and collections data management, existing and emerging data standards, and cutting edge techniques for data transformation, representation and big data analysis.
- the metadata and data representation standards: Dublin Core, Darwin core, OWL, XML, JSON, RDF.
- the semantic technology (OWL, RDF, BFO) and standards such as Darwin core, Audubon core, EML, Dublin core, EnvO, PCO, OBO, GO and MIxS is a plus.
- Ability to:
- apply technology within and outside the body of knowledge and specialty of this position.
- understand the needs of the broader community and the state of industry to design significant integrated solutions, which successfully address the depth and scope requirements of all customers.
- autonomously analyze complex problems; identify critical elements and alternatives, organize existing resources and new information to implement most appropriate solution.
- perform, with minimum supervision, the following tasks:
- the installation and maintenance of software and debugging of code in Python, Node.js, and shell scripting languages.
- the installation and maintenance of database systems, including SQL and NoSQL databases.
- the installation, configuration and maintenance of server hardware and cloud resources.