How did I get here? An accidental bioinformatician’s story

How did I get here? An accidental bioinformatician’s story

Eva Tosco-Herrera
PhD candidate at Hospital La Candelaria

When people ask me what I wanted to be as a grown-up when I was little, I honestly do not know what to tell them. For some people, their current job is an existential dream, it is something they were born with, a truthful desire, an inherent asset of their personality. The truth is, that is not my case. And it does not have to be like that for you to become who you want to be. As the title may suggest, nowadays I am, what you could call, a bioinformatician. Not even sure about that yet. I did touch a few computers while growing up, and I did live among nature, trees, mud, dragonflies, and crickets. But I learned very late that both things I loved could be merged into a job I could dedicate my life to, or even that I actually desired to get this type of job.

The first thing that comes to mind is probably second grade. My mother luckily had the time to invest in me, allowing me to learn how to read since I was 3. My teacher was impressed with my reading skills back then and told me I would make a great lawyer since they need to read a lot, very fast. That stuck with me for a bit, and it was not until fourth grade that I learned that a lawyer is supposed to look out for justice at trials, to fight for the weak, to face the powerful. It took me a while to realize that, what I thought, was not the truth. Lawyers could also defend guilty people. It was such a disappointment. But after a while of feeling lost, I began to look for other options. I was not sure of where to start, so I just listened, and I started answering “I really do not know”. 

In fifth grade, my school decided to do a practice class with us and with a few microbiology technicians. They taught us how to streak bacteria on a Petri dish, but not just any bacteria: our own bacteria. From our tongues, our noses, our hands. One of the technicians took a sample from my ear with a Q-tip and stroke the agar with it, leaving zig-zag marks on the gelatin. After two weeks, I vividly remember staring at those white spots on the red agar for the entire hour. I was fascinated. A whole unseen world presented itself in front of my eyes in the most outstanding way, and I could not look away. My perspective of the world changed dramatically and forever. I suddenly figured out that there was so much more than the eye could see.

I used to be very good at subjects such as Spanish, History, even English, Literature… However, I was not that good at Math, or Sciences, Physics, or Chemistry. The first ones bored me to death, and the last ones frustrated me but intrigued me at the same time, as a challenge. That was not reflected in my grades, but I was sure of what I liked, and when Basic Genetics showed up, I immediately thought “this is what I want to do”. I wish I could say I have not doubted along the way, but I would be lying. I have always run around plants, as well as other organisms. When I started my Biology degree, I was almost exclusively interested in Genetics, but other interest began to arise along the way. By the time I had to choose a topic for my final project, my wish was to do research in Botany. I chose not one but three botany-related topics, but all of them became unavailable because other people with higher degrees already had taken them. This led me to choose a biostatistics project as a fourth option, which later turned out to be bioinformatics. I and my project supervisor’s aim was to adapt and enrich an awesome online application to manage Nanopore sequencing data, originally developed by my current colleagues and my current Ph.D. supervisor. I got really interested, really enjoyed the project, and I contacted which is now the research group I work in.

What I really want you to get from my own personal story, that may or may not have been of your interest – sorry if it is the latter -, is that you do not have to get yourself straight from the beginning to be successful, or at least to end up doing what you love. I did not know I was going to end up here, and I have felt lost, lonely, a misfit, a failure. Multiples times. But here I am now, by luck, coincidence, or fate, meeting awesome people, facing problems but solving them soon enough and feeling satisfied, finding out new knowledge, and feeling useful for the greater good. This could also be you in the near future! Life finds a way to get you where you need to be, to learn what you need to learn. Trust the process, be optimistic. You are enough, you are worthy, and you will be whenever you want to be. You can do everything, but do not forget the basics: stay healthy, stay focused overall, but enjoy every minute. Time is limited and each moment will never be the same.


Trabajando con notebooks y entornos de trabajo Jupyter en bioinformática

Trabajando con notebooks y entornos de trabajo Jupyter en bioinformática

Héctor Rodríguez
PhD candidate at Hospital La Candelaria
Eva Suárez
PhD candidate at Hospital La Candelaria

A menudo, para el análisis bioinformático trabajamos directamente con consola, editores de texto para nuestros scripts, o entornos como RStudio. Sin embargo, el uso de cuadernos computacionales o notebooks, es cada vez más frecuente. En este artículo, explicaremos brevemente en qué consiste el trabajo con notebooks y sus ventajas, así como algunas herramientas para comenzar a integrar entornos de trabajo basados en Jupyter en nuestro trabajo del día a día.

Concepto y aplicación en bioinformática

Los notebooks son documentos interactivos en los que podemos integrar texto, código ejecutable en diversos lenguajes de programación, así como, tablas o figuras. Su uso se ha popularizado en los últimos años y cada vez es más común la información presentada en este formato y su integración en entornos de formación gracias a la sencillez y las ventajas que proporciona.

Su uso para análisis bioinformático es idóneo en muchos casos, ya que funcionan como un equivalente a los cuadernos de laboratorio tradicionales y nos dan la posibilidad de llevar un seguimiento documentado de nuestro trabajo ya sea código Python/R, comandos de bash o mediante tablas y anotaciones. Además, trabajar con notebooks puede resultarnos más sencillo que el manejo de scripts o consola a la hora de llevar a cabo un análisis determinado y es una buena manera de hacer nuestro análisis más reproducible. 

Existen diferentes herramientas para el desarrollo y el trabajo con Notebooks, a continuación, describiremos las principales características de dos de ellas: Jupyter Lab y Google Colab.

Jupyter Lab

Jupyter Lab es una interfaz de usuario basada en una aplicación web para el manejo de notebooks. La interfaz también permite trabajar con editores de texto, consolas y componentes personalizados por lo que integra en una sola interfaz todos los elementos implicados en el análisis de datos mediante el uso de notebooks.

Desde su lanzamiento, el proyecto Jupyter se ha ido popularizando y más allá de un entorno de trabajo, también funciona como herramienta para la docencia. Hoy en día es común encontrar los contenidos de cursos online o tutoriales en formato notebook y entornos online con todo lo necesario para trabajar desde el navegador.

  • Aprovecha los “Magic commands”. Además de código nativo, podemos ejecutar otro tipo de acciones como ejecución de código R, Perl… o comandos del sistema utilizando un prefijo en la celda de código (Ejemplo: %%bash). El uso de estos Magic commands amplía enormemente la capacidad de un notebook. Puedes encontrar más información en la documentación.
  • Instala kernels para tus lenguajes preferidos. Por defecto, Jupyter Lab permite la creación y ejecución de cuadernos con código Python. Gracias a la instalación de módulos de kernel, podemos crear y trabajar con notebooks en otros lenguajes de manera nativa como R, Scala, bash… Puedes encontrar la lista de kernels disponibles aquí.
  • Prueba las extensiones. La comunidad permanece activa en el desarrollo de extensiones o plugins de Jupyter Lab. Aunque no son necesarias, pueden facilitar múltiples tareas como documentación, monitoreo del sistema, visualización y debugging. Puedes encontrar algunas extensiones interesantes en esta lista.
  • Despliegue de entorno Jupyter Lab con conda. Con la herramienta conda es posible desplegar un entorno de cómputo con las dependencias necesarias y Jupyter Lab de manera sencilla, así que si tienes experiencia con entornos conda, te recomendamos este método (descrito como “project-based” en la segunda parte de este artículo) para probarlo sin instalar los paquetes en el sistema.

Jupyter Lab está disponible para cualquier sistema operativo con Python, y la instalación y uso están ampliamente documentados aquí

Google Colab

Google Colab es una herramienta que ofrece Google para el desarrollo de Notebooks. Es un servicio alojado de Jupyter Notebook que no requiere configuración o instalación previa. Colab nos ofrece un entorno interactivo en línea, que permite la ejecución de código de forma dinámica en la nube. Esta es una de las principales diferencias con otras plataformas como Jupyter Lab, que están pensadas para trabajar en local.

El código se ejecuta en una máquina virtual asignada a la cuenta de Google que use el usuario. No es necesario un ordenador demasiado potente, ya que el procesamiento se realiza desde los servidores de Google. Sin embargo, hay que tener en cuenta algunas, por ejemplo, las máquinas virtuales tienen un ciclo de vida de 12 horas y sus recursos pueden verse limitados cuando hay una alta demanda por parte de los usuarios.

Los cuadernos Google Colab se almacenan en Google Drive, lo que permite compartir, comentar y colaborar en el mismo documento con varias personas, facilitando el trabajo en equipo, además, permite buscar e importar cuadernos desde GitHub. Si queremos iniciarnos en el uso de Colab, Google nos ofrece cuadernos con explicaciones de sus características y ejemplos para su aplicación en ciencia de datos o machine learning. 


Mi cuarentena como Bioinformática

Mi cuarentena como Bioinformática

Sarai Varona
Bioinformatician at ISCIII

Me llamo Sarai, tengo 26 años, estudié la carrera de Bioquímica y Biología molecular y estudié dos másters, uno en Bioinformática y otro en Biología Molecular y Biomedicina. Actualmente trabajo en la unidad de Bioinformática del Instituto de Salud Carlos III de Madrid y hoy me gustaría contaros como viví la cuarentena de 3 meses por el SARS-CoV-2 en la que llegué a desarrollar un pipeline con una de las comunidades que yo más admiro en mi campo, nf-core. 

En la Unidad de Bioinformática del Carlos III, donde trabajo, lo que hacemos es dar servicio de análisis a todo el centro, principalmente para datos de secuenciación. Entre las múltiples tareas que realizamos, una de ellas es analizar datos de secuenciación de genomas virales (virus variados, además) para la obtención del genoma consenso de la muestra en cuestión.

Isabel Cuesta (responsable), Sarai Varona y Miguel Juliá, de la Unidad de Bioinformática del ISCIII, donde también trabaja Sara Monzón.

Como todos sabéis, en marzo de 2020 se decretó un estado de alarma debido a una pandemia mundial que ha afectado todo el planeta tierra (por si no os habíais dado cuenta), y nuestro papel como unidad de análisis de secuenciación de uno de los centros de referencia de salud pública en España era obtener el genoma consenso del SARS-CoV-2 a partir de la secuenciación de éste.

En aquel momento no existía un protocolo establecido de como realizar este análisis con los datos de secuenciación de Illumina, pero basándonos en nuestra experiencia previa con otros tipos de virus se nos ocurrió que podíamos utilizar algunas de las herramientas que ya utilizábamos e integrarlas en un único flujo de análisis utilizando un software llamado Nextflow. Ya veníamos utilizando Nextflow con anterioridad y lo que este permite es crear flujos de trabajo escalables y reproducibles, lo que es muy útil en nuestro campo y si no lo conocéis os animo a que lo probéis.

Aprovechando que se iba a celebrar pronto el primer Biohackaton virtual de SARS-CoV-2, nos inscribimos con el objetivo de incluir nuestro flujo de trabajo entre los que se presentaban en el hackaton. Fue en ese momento cuando la comunidad nf-core, una comunidad europea que desarrolla flujos de trabajo estandarizados utilizando Nextflow, dio con nuestro borrador del proyecto de flujo de trabajo en Nextflow para analizar datos de secuenciación de SARS-CoV-2. Estos contactaron con nosotros para ofrecernos una colaboración, de forma que juntos participaríamos en este Biohackaton virtual para desarrollar de forma eficaz y fiable un flujo de análisis capaz de generar genomas consenso a partir de la secuenciación de Coronavirus en Illumina. 

Para mi y mis compañeros de trabajo esta oferta fue todo un orgullo y sin dudarlo aceptamos, sin tener claro lo que ello implicaba ni cual sería la mecánica de trabajo en conjunto con la comunidad de nf-core. El primer paso fue crear un repositorio de github desde el que todos pudiéramos trabajar desde nuestras casas (ya que estábamos confinados) y desde todas partes del mundo (ya que es una comunidad europea en la que participan distintos países). Después creamos un canal de slack en el que poder comunicarnos de forma fluida y dividirnos las tareas que desempeñaría cada uno. Yo pensaba que lo grueso del trabajo ya estaba hecho ya que en nuestro grupo ya teníamos escogidos los programas que incluiría el flujo de trabajo después de nuestra experiencia previa con virus y después de haber contrastado distintos programas con este fin. Sin embargo, no todo sería tan fluido y sencillo como yo esperaba.

La primera semana creando este flujo de trabajo se desarrolló durante la celebración del Biohackaton, de forma que nos pasábamos casi todas las horas del día programando y testando código sin parar, poniéndolo en común con los demás compañeros y realizando un montón de análisis de prueba y error. Como estábamos participando el este evento era de esperar que pasaríamos muchas horas seguidas en la programación y lo aceptamos de buen grado. Sin embargo, el Biohackaton finalizó y nuestro flujo de trabajo no había finalizado del todo, por lo que seguimos trabajando en ello como parte de nuestra rutina laboral de bioinformático.

Aquí es donde empezó la parte más dura, ya que se esperaba de nosotros seguir participando en la programación de este flujo de trabajo al mismo nivel que lo habíamos hecho hasta ese momento, pero además teníamos que seguir con nuestro trabajo habitual que era dar servicio a un centro de sanidad nacional. Afortunadamente seguíamos todos encerrados en casa y no había que perder horas del día en ir a trabajar, ni en ir al gimnasio y a veces no era necesario ni ducharse todos los días (todos lo habéis hecho, no lo neguéis), así que teníamos muchísimas más horas diarias para poder afrontar la carga laboral que se nos vino encima. Trabajábamos fines de semana y festivos mientras veíamos como el resto de la población (o incluso de la comunidad científica no bioinformática) se dedicaba a la repostería, a empezar con el yoga o a aprender un idioma nuevo. Nosotros no podíamos permitirnos ese lujo. 

Había días que teníamos la suerte de poder ir físicamente al centro a poner un secuenciador y teníamos una excusa no solo para salir de casa, sino para poder levantar la cabeza de la pantalla por unos momentos con una excusa que era inevitable y que no se podía posponer. Así pasaron los días y también pasó una cuarentena. Entre risas y lágrimas (sí, también hubo lágrimas), empezamos a entrar en fases de desescalada y el flujo de trabajo estaba ya casi finiquitado, solo quedaba testar que no hubiera errores de código, ya podíamos reservar parte de nuestro día para salir a pasear con el resto de las personas normales mientras el código se testaba, ya lo solucionaríamos al volver de pasear en la hora que nos dejaba el gobierno.

Así, junto con el levantamiento del estado de alarma en España, la primera versión oficial de nuestro flujo de trabajo para la creación de genomas consenso de SARS-CoV-2 vio la luz del sol la primera semana de julio de 2020, lo llamamos viralrecon. Fue una liberación saber que el flujo de trabajo ya había terminado, que por fin en nuestro país se podía salir de tu comunidad autónoma y volver a tu pueblo de origen. Tanto mis compañeros como yo sentimos una sensación de liberación infinita, porque sabíamos que podríamos empezar a dedicar tiempo de nuestro día al ocio otra vez.

A día de hoy, utilizamos este flujo de trabajo de Nextflow para analizar todos los datos de secuenciación de SARS-CoV-2 y además lo estamos utilizando también para generar genomas consenso de otros virus que secuenciamos en el centro. Espero que este testo haya servido a alguien a sentirse identificado, ya que somos de los pocos gremios que en un confinamiento pueden seguir dando el 100% o a descubrir Nextflow o nf-core y que os haya servido en vuestro día a día. Si alguno tiene curiosidad por conocer más sobre este flujo de trabajo, sobre Nextflow o sobre la comunidad nf-core, puede contactar con nosotros o entrar en cualquiera de estas webs: 


A Pandas’ mini course experience with some labmates at the CNIO

A Pandas’ mini course experience with some labmates at the CNIO

Javi Lanillos
PhD candidate at CNIO

Right after the completion of my masters’ degree, by June 2018, I started to work as a bioinformatics technician at the Hereditary Endocrine Cancer group of the Spanish National Cancer Research Center (CNIO), in which I now do my PhD. A couple of months before starting this job in Madrid, a classmate told me about the Python’s library called Pandas. One of the first things I did as a bioinformatics technician was to start using Pandas on my own, which became more and more frequently used for any daily task. One of the main tasks as a bioinformatics technician was to perform pipeline analysis over hundreds of tumor samples which underwent targeted NGS (Next Generation Sequencing).

This work required to set up NGS data analysis pipelines, which can be made with great workflow managers (e.g., Snakemake, Nextflow) and the very many tools to process NGS data from .fastq files to annotated VCF (Variant Calling Format) files. But this was not the final report for my labmates’ experiments. We needed a readable format to explore the somatic variation across hundreds of samples at once. So, a big data table including all variation, all samples and all annotations of our choice would help us to not just spot interesting findings, but also be aware of artifacts derived from NGS technologies, group or arrange samples with common and different patterns, etc. I found it very effective to parse a multi-sample VEP (Variant Effect Predictor) annotated VCF file using Pandas and: perform column arrangement operations, string pattern extraction, multi-column operations for actions such as filtering (e.g. Variant Allele Frequency, Depth of Sequencing Coverage, type of mutation, effect on the transcript chosen as the “canonical”) or detection (what samples harbor a specific mutation and fulfill multiple conditions).

Excel is extremely useful, but limited for some operations

The previous paragraph is a long explanation for just a single task, which got me into the use of Pandas’ library. Another day, a labmate asked me to help merge the data from two experiments based on the EPIC 450K Methylation Arrays. The data consisted of two different data frames (each row was a methylation CpG island, circa half a million rows) with different samples, which had to be merged into a single dataframe based on CpG probe ids. Some labmates used to apply a vertical look up (“VLOOKUP”) search with Excel but these large files required a lot of RAM memory and they would freeze their monitors or end up crashing their computers. In addition, they would apply this operation manually as many times as the number of samples to merge. Pandas library offers .concat(), .merge() , .join() , .append() functions which would help you find a way of merging your data. Also, I often combined the .map() function and Python’s dictionary data structures to input external information into a dataframe. Whenever I quickly check some data in any dataframe, I load the file using Pandas in the shell using an iPython notebook. For example, the .groupby() function helps to summarize data in countless ways, or I usually parse text with the combination of self-made Python’s functions (def()), .str
, .apply() and lambdas, convert plain text to .xlsx or find a fast way to plot and share results with my labmates.

import pandas as pd all day, import pandas as pd all night

After two and a half years at the CNIO, Pandas is quite frequently imported at the very top lines of my Python scripts. Pandas is helpful to parse and extract data from diverse formats in bioinformatics (e.g., .gtf) and transform them into other formats (e.g., .bed). It is also at the core of some Python scripts which I run in parallel for the analysis of hundreds of dataframes which finally merged into one. Now, I try to learn more and more about transforming data frames in countless ways (e.g., combination of multiple Pandas functions like .pivot(), .groupby(), .loc(), etc).

“I surely realized that I was getting used to talking of any project and mentioning the Pandas’ library at any point [...], so, after knowing of some daily issues with dataframes, I started to ask my colleagues the other way around: what actions do you usually perform using dataframes?

The “vlookup” Excel function to merge data tables, formatting strings, manually curating data, multiple manual searches for genes names in .xlsx tables, apply statistical tests, sorting data, and a loooong etcetera, I found that we had in common the frequent use of data frames to explore data. From PhD students to senior researchers, from pure “wet-labbers” to bioinformatics wizards, I really felt like interacting with them and sharing time to learn and discuss data analysis in our research.

So, why not just get to know the Pandas’ library? How could we all participate without spending much time?

The safest (in this pandemic situation) and most flexible way was through an online minicourse. Although, I feared I could not make it the most interacting and intuitive channel to learn. However, we got the advantage that we all knew each other, which could help to increase the interaction. I took some time during two weeks to write down the activity and figure out some solutions to certain aspects and potential issues. I wanted the audience to be as broad as possible but took into special consideration those friends who were not used to any programming language or opening the “Terminal”. 

Second, I asked what issues some colleagues had faced when doing data frame analysis and I decided to make up a toy exercise in which we could use some Pandas utilities and go through those issues. Another option would have been to explain some separated scenarios and solve them one at a time. Evenmore, I can imagine another option, in which participants previously pool their needs or issues and we work out some exercises that provide ideas or solutions.

Also, regarding technical issues, there are more ways of executing Pandas (and Python) than I really know. So, limited to what I have used and heard of, I thought of how we all were going to work during the activity. Or at least, offer a simple way for everyone. I thought it was easy to install Miniconda in any Operating System and get the necessary Python libraries with Conda or pip. A key technical tip from a colleague at the Bioinformatic Unit of the CNIO (thanks, Tomás) was to install Miniconda without Admin privileges, since some people would have to be working at the CNIO during the training, and Admin privileges are reserved to the IT department. I have also found a very nice blog which explains very well how to do it.

Once technical preparations were done, I sent an email to all members (from PIs to students) from labs associated with the Human Cancer Genetics Program at the CNIO and invited them to sign if they liked the idea. These colleagues are closer to me because we all participate together in weekly Lab Meetings and Journal Clubs sessions and I found it easy to invite them in this first attempt to a Pandas’ minicourse and learn myself with this first experience. After getting the list of participants, I sent those another email to follow some installation instructions and links to some data I had prepared before.

How did it go? Me: It was great!

I chose a cancer-research related topic and proposed to load publicly available RNAseq count datatables at the UCSC Xena Browser from the TCGA-BRCA (The Cancer Genome Atlas – Breast Cancer) cohort and explore the PAM50 classification. In brief, PAM50 is a classification of a breast cancer tumor into a molecular subtype based on a specific gene expression signature derived from microarray data which confers invaluable prognostic information in the clinics. So, the proposal was, based on RNAseq data counts from the TCGA-BRCA, could we convert gene expression from the PAM50 genes into a 2-D plot showing the first two Principal Component Analysis axes? This way, we could practice on reading dataframes, extract the information we needed using Pandas, make use of other Python’s libraries such as the Scikit-learn and end our exercise with a figure using Matplotlib too.

30 people from seven labs signed for participation in either the first or the second day, or both! During the two-hour Zoom meeting, I quickly went through some slides with the installation instructions, the motivation for this course and introduced the exercise. Later, I asked them to execute the code in their Terminals while I was explaining step-by-step what we were doing. I asked them to interrupt me at any moment and they also used the “chat” option in Zoom to copy and paste some execution errors. During the first session, as part of the deal with “live-coding streaming issues”, we found for example that some people had to replace at some extent in the code the single quotation marks (‘’) with double quotations (“”). It seemed to be related to their specific OS and miniconda distribution versions.

Okay, but how did it go to them?

Overall, through a brief and anonymously-answered survey to better collect their opinions and impressions, the general feeling of 15 participants about content/duration balance was “okay” (not too short, not too long). Most of them were Undergraduate/Graduate and PhD students, and there were some senior researchers and technicians too. I was glad to see high heterogeneity regarding their expertise with programming (in general, Python, R…), from users which only use Excel to “fair enough/advanced” programming users, including some of them whose programming skills are getting rusty and wanted to refresh it. “Excel” was the most common way to load and work with dataframes in their daily activities, followed by minor cases with “plain-text”, “comma-separated files”, “dataframes in general using R, and dealing with “TSV” and “VCF” files. Of course, in Human Cancer Genetics labs, we work with dataframes for mutations, gene expression or clinical data analysis. 60% of them used Windows OS and 70% had heard before of Miniconda, the terminal and Python (actually, all of them knew of Python). Only 40% had heard of Matplotlib and/or Scikit-learn libraries too. As expected, more than 90% of people replying had worked previously with dataframes, although 2 of them did not.

Also, we had an interesting debate about the use of Python instead of R. I am much more confident programming with Python since it’s the language I mostly practice during my masters and I do confess the use of Pandas has tightly fixed my preference for Python (while talking about dataframes). I am certainly not experienced enough to talk about it, and evenmore, bioinformatics/IT forums are flooded with posts about this topic. If you want to know more, you can start with this one in Biostars. As a bioinformatics PhD student, my aim is to expand my programming skills and solve my research questions using the smartest solution (by “smartest”, I mean “easiest”, “most computationally powerful”, “as close as possible to the state-of-the-art”, “more robust”,…). Who said programming is addictive? The more programming I learn, the more time I’m willing to spend on it. Hey, someone made an interesting question about parsing larger (let’s say ~20GB) VCF files using Pandas. This brought up other issues such as manually modifying a VCF, which can break the right formatting. I’d rather use other bioinformatics tools to handle big VCFs files. Also, I would even split and parallelize the problem. But, if at some point, some lines of code with Python were to make my day, apart from trying out with Pandas, I am starting to read about other libraries for “larger data” such as Dask, Vaes and PySpark. However, this topic is far from being an “introductory minicourse to Pandas”. Finally, some days after, I knew that some of them could not finish the whole exercise for various reasons and errors while executing the code. Of course, I desired all of them to achieve the final figure, but I’m happy to hear that all of them have been up to a two-hour gap to learn how to open the terminal, install Miniconda, discover iPython and run some code using Pandas.

Some final and cheering thoughts

Two main topics for me to reflect on are what kind of practical exercises they were more keen to do and also, to make sure that main concepts had been clear after the exercise. Regarding the first question, it seemed to be multiple likings. I offered them to choose either: a) I prefer to learn one Pandas utility at a time and end up having a “Pandas cheat-sheet”, b) I prefer to practice with a “made-up” exercise such as the one proposed in this minicourse, c) I would bring my own examples and problems and practice over possible solutions, d) all of them. I’m glad to see that 65% were open to getting “all of them”, but also really interested to see how they could have gotten introduced to Pandas by using other different approaches.

Big thanks to all participants which made this initiative possible. Whether they will use it or not, Pandas can become more or less useful for each of you. I do think we have accomplished the main goal of getting introduced to this wonderful Python’s library and also, sharing some time to get to know each other’s work, which I’m sure is the way to accelerate our work in cancer research.

The main message: Pandas library can be quite helpful in your daily data analysis projects!

EntrevistasRSG_SaraMonzón_Moment (1)

#MeetABioinformatician​ – Sara Monzón

#MeetABioinformatician​ - Sara Monzón

Miguel Rodríguez
PhD candidate at CRG
Sarai Varona
Bioinformatician at ISCIII
Javi Lanillos
PhD candidate at CNIO

Sara Monzón es analista bioinformático en el Instituto de Salud Carlos III. Además es la líder actual de RSG-Spain. En esta entrevista nos cuenta un poco de cómo es su día a día y su visión de la bioinformática.

¿Qué es #MeetABioinformatician?

#MeetABioinformatician es una iniciativa de RSG-Spain donde damos a conocer diferentes personas implicadas en el ámbito de la bioinformática en España. Si quieres promocionar tu labor, ponte en contacto con nosotros a través de nuestro formulario de contacto o la dirección de correo. ¡Te esperamos!


6th European Student Council Symposium (ESCS)

6th European Student Council Symposium (ESCS)

The 6th European Student Council Symposium (ESCS) was co-organized by the Spanish National Bioinformatics Institute (INB/ELIXIR-ES) and the Barcelona Supercomputing Center (BSC), with the participation of RSG-Spain. The conference took place on 6th September 2020 virtually, as a satellite event of the European Conference on Computational Biology (ECCB2020). 

The ESCS2020 brought together 344 participants from 19 different countries. This event consisted of two poster sessions, two keynote speaker talks, six oral talks and four flash talks carried out by students, a round table about mental health in academia and an online social event.

Students submitted 74 abstracts of which 10 were chosen for oral and flash talk presentations:


The program at a glance:

“Extending the small molecule principle to all levels of biology”. Institute for Research in Biomedicine, Barcelona, Spain

Nikolina Sostaric: “Molecular dynamics shows complex interplay and long-range effects of post-translational modifications in yeast protein interactions”. KU Leuven, Belgium.

Miguel Rodriguez Galindo: “Germline de novo mutation rates on exons versus introns in humans”. Center for Genomic Regulation, Spain.

Loic Lannelongue: “The Green Algorithms project: quantifying the carbon impact of bioinformatics and genomics”. Cambridge-Baker Systems Genomics Initiative (Department of Public Health and Primary Care, University of Cambridge), United Kingdom.

Edgar Garriga Nogales: “The Regressive alignment. A new horizon for the MSA”. Center for Genomic Regulation, Spain.

Asma Yamani: “A sequence-based approach to classify activation/inhibition relationship of protein-protein interaction (PPI) using machine learning techniques”. King Fahd University of Petroleum and Minerals, Saudi Arabia.

Ina Maria Deutschmann: “Effects of environmental selection on the global-ocean microbial interactome”. Institute of Marine Sciences, Spain.

Sabrina Ali: “Exploration of SARS-CoV-2 Main Protease Inhibitors from Existing Antiviral Drugs”. Jahangirnagar University, Bangladesh.

Feibo Duan: “Interactive visualization of 3D biological model in holographic pyramid”. Leiden Institute of Advanced Computer Science (LIACS), Leiden University, the Netherlands.

Raul Pérez-Moraga: “A drug repurposing strategy for COVID-19 treatment based on topological data analysis (TDA) and persistent homology-derived protein-protein similarities”. ESI International Chair@CEU-UCH, Spain.

Kishan KC: “Interpretable sparse encoding of sequences for protein-protein interaction prediction”. Rochester Institute of Technology, United States.

“Confounder-aware systems in medical approaches”. Max Delbruck Center for Molecular Medicine, Berlin, Germany.

The discussion started with Sofia Forslund, Dr. Patrick Aloy, Sayane Shome and Nazeefa Fatima who spoke about their personal experience and mechanisms for dealing these mental health issues.

Several members of RSG Spain were involved in organizing the Symposium. Originally the event would have taken place near Barcelona but going online made it more challenging for the organization. The social event was prepared by Lluís Revilla, one of our social media managers and representative of the outreach committee. Alberto Langtry, member of the organizing committee of the ECCB2020 hosted a session, two members were selected to give a talk and many attended and listened to the discussion panels and talks and key notes. Alba Álvarez, chair of the program committee; Raquel Benitez, member of the program committee, and Sara Monzón, president of RSG-Spain, were reviewers in charge of selecting both, the oral presentations and the flash talks.


VII Bioinformatics Symposium

VII Bioinformatics Student Symposium



The VII Bioinformatics Student Symposium organized by RSG-Spain and co-organized by the Instituto Nacional de Bioinformática (INB), the technological platform for bioinformatics of the Instituto de Salud Carlos III (ISCIII) and the Centro de Biotecnología y Genómica de Plantas (CBGP) joined together more than 85 students in Bioinformatics from around Spain. The conference was held on 17th October 2019 in Madrid, as a satellite event of the “II Jornadas de Bioinformática y Biología Computacional de la Comunidad de Madrid” (II BioinfoCAM) organized by CBGP (UPM-INIA).

The program at a glance:


VI Bioinformatics Symposium

VI Bioinformatics Student Symposium

The VI Bioinformatics Student Symposium organized by RSG-Spain joined together more than 40 students in Bioinformatics from around Spain. The conference was held on 13th November 2018 in Granada, as a satellite event of the XIV Symposium on Bioinformatics (JBI2018) organized by the University of Granada, the Centre for Genomics and Oncological Research (GENyO), the Barcelona Supercomputing Center (BSC) and the Spanish National Bioinformatics Institute (INB/ELIXIR-ES).

The program at a glance: 




V Bioinformatics Student Symposium

V Bioinformatics Student Symposium

The V Bioinformatics Student Symposium organized by RSG-Spain joined together more than 80 students in Bioinformatics from around Spain. The conference was held on 12th May 2017 at the Barcelona Biomedical Research Park (PRBB) in Barcelona.

The program at a glance:

Understanding diseases through bioinformatics. 3 talks from selected abstracts

Working out the structures of life. 3 talks from selected abstracts

Tool development. 3 talks from selected abstracts

Networking with omics. 3 talks from selected abstracts